[GE users] Jobs being suspended incorrectly

opoplawski orion at cora.nwra.com
Tue Apr 6 22:13:19 BST 2010

On 04/06/2010 11:54 AM, reuti wrote:
> Hi,
> Am 06.04.2010 um 19:40 schrieb opoplawski:
>> I'm seeing jobs being suspended incorrectly (6.2u5):
>> $ qstat -u lund
>> job-ID  prior   name       user         state submit/start at     queue
>>                           slots ja-task-ID
>> -----------------------------------------------------------------------------------------------------------------
>>    16548 0.60000 short      lund         T     04/06/2010 11:32:16
> T means suspend due to threshold reached.

Ah, okay, this one actually does make sense.  It's unfortunate though 
that the machine has to be idle for several minutes after suspending the 
jobs in the subordinate queue for the load average to come down before 
the new job can run.

I did have an occurrence of a job ending up in S.  I'll report again if 
that happens again.

>> The compute.q queue is subordinate to the mpi queue.  But why is the job
>> in T?  I've also seen jobs in S.  It seems that properties of the
>> compute.q queue on that machine is getting applied to the job in the mpi
>> queue.
> Correct. When you enter mpi in compute.q's subordinate_list it (i.e. mpi) will get suspended when the number of slots is filled in compute.q.

No, it's the other way around.  compute.q is listed in mpi's subordinate 
list (i.e. "compute.q is subordinate to the mpi queue").  I created the 
mpi queue by cloning the compute.q.  Wonder if something is strange 
because of that...

Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion at cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list