[GE users] Qstat -t ???

Viktor Oudovenko udo at physics.rutgers.edu
Wed May 25 09:24:05 BST 2005


Hi, Andy,

I  have a cluster of 300 nodes. And I install master on master host and then
separately install binaries on all slaves with NFS sharing default directory
only.

On a few machines I tried  and got the same result:

/home/udo
[04:17:54]udo at rupc04:~>qstat -help|head -1
SGE 6.0u4
[04:17:55]udo at rupc04:~>qstat -t           
job-ID  prior   name       user         state submit/start at     queue
master ja-task-ID task-ID state cpu        mem     io      stat failed 
----------------------------------------------------------------------------
----------------------------------------------------------------------------
---------------
critical error: !!!!!!!!!! lGetList(): JAT_usage_list not found in element
!!!!!!!!!!
  21810 2.14545 myri_test  cfennie      r     05/25/2005 00:34:22
myrinet at sub04n62               MASTER        Aborted
[04:18:11]udo at rupc04:~>


Even on opteron machines:

[04:20:28]udo at sub04n201:~>qstat -help|head -1
SGE 6.0u4
[04:20:33]udo at sub04n201:~>qstat -t
job-ID  prior   name       user         state submit/start at     queue
master ja-task-ID task-ID state cpu        mem     io      stat failed 
----------------------------------------------------------------------------
----------------------------------------------------------------------------
---------------
critical error: !!!!!!!!!! lGetList(): JAT_usage_list not found in element
!!!!!!!!!!
  21810 2.14545 myri_test  cfennie      r     05/25/2005 00:34:22
myrinet at sub04n62               MASTER        Aborted
[04:20:40]udo at sub04n201:~>


I am happy to provide any other information.
Just now the queue looks like:

[04:20:40]udo at sub04n201:~>qstat
job-ID  prior   name       user         state submit/start at     queue
slots ja-task-ID 
----------------------------------------------------------------------------
-------------------------------------
  21810 2.14545 myri_test  cfennie      r     05/25/2005 00:34:22
myrinet at sub04n62                   8        

  21813 2.14545 myri_test  cfennie      r     05/25/2005 00:45:22
myrinet at sub04n67                   8        

  21820 1.85974 d4.p11     dieguez      r     05/25/2005 01:20:22
myrinet at sub04n73                   6        

  21814 2.14545 myri_test  cfennie      r     05/25/2005 00:47:02
myrinet at sub04n74                   8        

  21821 1.85974 d3.p04     dieguez      r     05/25/2005 01:21:02
myrinet at sub04n78                   6        

  21823 1.85974 d3.p06     dieguez      r     05/25/2005 01:21:22
myrinet at sub04n86                   6        

  21830 2.00000 qmc        udo          r     05/25/2005 03:03:42
opteron at sub04n205                  1        

  21831 2.00000 qmc        udo          r     05/25/2005 03:04:02
opteron at sub04n206                  1        

  21832 2.00000 qmc        udo          r     05/25/2005 03:04:22
opteron at sub04n206                  1        

  21815 2.00000 serial     udo          r     05/25/2005 01:07:02
serial at sub04n18                    1        

  21824 1.14545 relaxed    dieguez      r     05/25/2005 01:23:42
serial at sub04n19                    1        

  21825 1.14545 relaxed    dieguez      r     05/25/2005 01:24:02
serial at sub04n19                    1        

  21826 1.14545 relaxed    dieguez      r     05/25/2005 01:24:22
serial at sub04n20                    1        

  21827 1.14545 relaxed    dieguez      r     05/25/2005 01:24:42
serial at sub04n20                    1        

[04:21:43]udo at sub04n201:~>


As you see  the first line is PE job on myrinet cluster. Could it be that
the problem is associated with parallel environment?

With kind regards,
viktor

> -----Original Message-----
> From: Andy Schwierskott [mailto:andy.schwierskott at sun.com] 
> Sent: Wednesday, May 25, 2005 4:10
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Qstat -t ???
> 
> 
> Hi,
> 
> are you useing the correct qstat binary?
> 
>     qstat -help|head -1
> 
> Can you describe a setup how to reproduce the behavior? It's 
> working for me.
> 
> Do you have a "sge_qstat" or ".sge_qstat" file with 
> additional qstat options?
> 
> Andy
> 
> > Thanks you for update 4 . It has arrived just in time.
> >
> > Once installed I tried  installed 6.0u4 and
> >
> >
> > sub04n01:~ # qstat -t
> > job-ID  prior   name       user         state submit/start 
> at     queue
> > master ja-task-ID task-ID state cpu        mem     io      
> stat failed
> > 
> ----------------------------------------------------------------------
> > ------
> > 
> --------------------------------------------------------------
> --------------
> > ---------------
> > critical error: !!!!!!!!!! lGetList(): JAT_usage_list not 
> found in element
> > !!!!!!!!!!
> >  21810 2.13223 myri_test  cfennie      r     05/25/2005 00:34:22
> > myrinet at sub04n62               MASTER        Aborted
> >
> >
> > It is not crucial but it seems  the problem is still there?
> >
> >
> > V
> > P.s. qstat -g t   works fine
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list