[GE users] Strange behavior with tight integration: no free queue for job

jlopez jlopez at cesga.es
Mon Dec 1 16:41:11 GMT 2008


reuti wrote:
> Am 26.11.2008 um 10:24 schrieb jlopez:
>
>   
>> <snip>
>>     
>>> As many mpirun are used in your setup, maybe a previous task (which
>>> should have already left the node) was still active on the node  
>>> cn099.
>>>
>>> Is this happening all the time or only for certain jobs?
>>>
>>> -- Reuti
>>>
>>>
>>>
>>>       
>> The message "no free queues" appears only in a very small portion  
>> of the
>> mpi jobs. I have been analyzing the logs and even for upc jobs the
>> message does not appear usually. The other fact is that even if this
>> message appears several times in the logs only a few of the jobs that
>> got this message finally fail, the rest are still able to continue.
>>
>> One doubt, if you run a qrsh -inherit to a slave node and later on
>> before this qrsh is finished you send a new qrsh, does it fail because
>> of no free queues? If so I could do some tests.
>>     
>
> The jobs will fail, if too many qrsh are send to a node than granted  
> by SGE. But IIRC the error message would be different then.
>
>   
I will try some tests to check this behavior and I will let you know the 
results.

Thanks a lot for all your suggestions,
Javier

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=90603

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

    [ Part 2, Text/X-VCARD (charset: UTF-8 "Internet-standard Unicode") ]
    [ (Name: "jlopez.vcf") 14 lines. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list