[GE users] Qstat -t ???

Andy Schwierskott andy.schwierskott at sun.com
Wed May 25 10:18:19 BST 2005


Viktor,

we will reopen issue 1420. If we'll need more information from you we'll let
you know.

Andy

> Hi, Andy,
>
> I  have a cluster of 300 nodes. And I install master on master host and then
> separately install binaries on all slaves with NFS sharing default directory
> only.
>
> On a few machines I tried  and got the same result:
>
> /home/udo
> [04:17:54]udo at rupc04:~>qstat -help|head -1
> SGE 6.0u4
> [04:17:55]udo at rupc04:~>qstat -t
> job-ID  prior   name       user         state submit/start at     queue
> master ja-task-ID task-ID state cpu        mem     io      stat failed
> ----------------------------------------------------------------------------
> ----------------------------------------------------------------------------
> ---------------
> critical error: !!!!!!!!!! lGetList(): JAT_usage_list not found in element
> !!!!!!!!!!
>  21810 2.14545 myri_test  cfennie      r     05/25/2005 00:34:22
> myrinet at sub04n62               MASTER        Aborted
> [04:18:11]udo at rupc04:~>
>
>
> Even on opteron machines:
>
> [04:20:28]udo at sub04n201:~>qstat -help|head -1
> SGE 6.0u4
> [04:20:33]udo at sub04n201:~>qstat -t
> job-ID  prior   name       user         state submit/start at     queue
> master ja-task-ID task-ID state cpu        mem     io      stat failed
> ----------------------------------------------------------------------------
> ----------------------------------------------------------------------------
> ---------------
> critical error: !!!!!!!!!! lGetList(): JAT_usage_list not found in element
> !!!!!!!!!!
>  21810 2.14545 myri_test  cfennie      r     05/25/2005 00:34:22
> myrinet at sub04n62               MASTER        Aborted
> [04:20:40]udo at sub04n201:~>
>
>
> I am happy to provide any other information.
> Just now the queue looks like:
>
> [04:20:40]udo at sub04n201:~>qstat
> job-ID  prior   name       user         state submit/start at     queue
> slots ja-task-ID
> ----------------------------------------------------------------------------
> -------------------------------------
>  21810 2.14545 myri_test  cfennie      r     05/25/2005 00:34:22
> myrinet at sub04n62                   8
>
>  21813 2.14545 myri_test  cfennie      r     05/25/2005 00:45:22
> myrinet at sub04n67                   8
>
>  21820 1.85974 d4.p11     dieguez      r     05/25/2005 01:20:22
> myrinet at sub04n73                   6
>
>  21814 2.14545 myri_test  cfennie      r     05/25/2005 00:47:02
> myrinet at sub04n74                   8
>
>  21821 1.85974 d3.p04     dieguez      r     05/25/2005 01:21:02
> myrinet at sub04n78                   6
>
>  21823 1.85974 d3.p06     dieguez      r     05/25/2005 01:21:22
> myrinet at sub04n86                   6
>
>  21830 2.00000 qmc        udo          r     05/25/2005 03:03:42
> opteron at sub04n205                  1
>
>  21831 2.00000 qmc        udo          r     05/25/2005 03:04:02
> opteron at sub04n206                  1
>
>  21832 2.00000 qmc        udo          r     05/25/2005 03:04:22
> opteron at sub04n206                  1
>
>  21815 2.00000 serial     udo          r     05/25/2005 01:07:02
> serial at sub04n18                    1
>
>  21824 1.14545 relaxed    dieguez      r     05/25/2005 01:23:42
> serial at sub04n19                    1
>
>  21825 1.14545 relaxed    dieguez      r     05/25/2005 01:24:02
> serial at sub04n19                    1
>
>  21826 1.14545 relaxed    dieguez      r     05/25/2005 01:24:22
> serial at sub04n20                    1
>
>  21827 1.14545 relaxed    dieguez      r     05/25/2005 01:24:42
> serial at sub04n20                    1
>
> [04:21:43]udo at sub04n201:~>
>
>
> As you see  the first line is PE job on myrinet cluster. Could it be that
> the problem is associated with parallel environment?
>
> With kind regards,
> viktor
>
>> -----Original Message-----
>> From: Andy Schwierskott [mailto:andy.schwierskott at sun.com]
>> Sent: Wednesday, May 25, 2005 4:10
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Qstat -t ???
>>
>>
>> Hi,
>>
>> are you useing the correct qstat binary?
>>
>>     qstat -help|head -1
>>
>> Can you describe a setup how to reproduce the behavior? It's
>> working for me.
>>
>> Do you have a "sge_qstat" or ".sge_qstat" file with
>> additional qstat options?
>>
>> Andy
>>
>>> Thanks you for update 4 . It has arrived just in time.
>>>
>>> Once installed I tried  installed 6.0u4 and
>>>
>>>
>>> sub04n01:~ # qstat -t
>>> job-ID  prior   name       user         state submit/start
>> at     queue
>>> master ja-task-ID task-ID state cpu        mem     io
>> stat failed
>>>
>> ----------------------------------------------------------------------
>>> ------
>>>
>> --------------------------------------------------------------
>> --------------
>>> ---------------
>>> critical error: !!!!!!!!!! lGetList(): JAT_usage_list not
>> found in element
>>> !!!!!!!!!!
>>>  21810 2.13223 myri_test  cfennie      r     05/25/2005 00:34:22
>>> myrinet at sub04n62               MASTER        Aborted
>>>
>>>
>>> It is not crucial but it seems  the problem is still there?
>>>
>>>
>>> V
>>> P.s. qstat -g t   works fine
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list