[GE users] PE Slots Problem

Brian R. Smith brs at usf.edu
Sun Aug 12 08:14:13 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Ivan,

I had created a boolean complex, t_devel, and set its default value to 
FALSE (qconf -mc), naively believing that this assignment would take for 
each queue instance.  So, for example, if all.q has no explicit 
definition for t_devel, it will automatically assume it to be FALSE.  
This, as it turns out, was not the case.  The queue devel.q had 
specified t_devel=TRUE so that development jobs would be sent to a set 
of over-subscribed (8 slots per processor) nodes for low time limit 
execution.  Essentially, any job that requested t_devel would be 
considered a non-production run and would be sent to the appropriate 
hardware (based on any other resource requests that were made).

For normal jobs, t_devel is not considered or specified in the job's 
submit script but it was added to the file 'sge_request' as 
t_devel=FALSE in order specify a default value.  Because of this, each 
submitted job (except for development jobs) was requesting 
t_devel=FALSE, but each queue had t_devel as being undefined.  Since 
this didn't match up, no slots were made available to the parallel 
environment and hence my jobs could not run.

To correct the problem, I simply added t_devel=FALSE to each one of my 
queues' complex lists so that it would correspond to the settings I 
added to sge_request.

I don't know the details of your problem or if its related to this.  If 
you want to give me some details on your problem and perhaps some other 
information like queue configs, parallel environment configs, message 
file output, etc., I'd be willing to take a look.

Brian

Ivan Adzhubey wrote:
> Hi Brian,
>
> I reported exactly the same problem with our dual-CPU cluster nodes about a 
> year ago and I was never able to get it resolved. I tried all possible 
> configurations and even reinstalled SGE from scratch couple of times but 
> nothing worked. Neither Reuti was able to help me even though he has spent 
> quite some time on this issue. So I'd appreciate if you can elaborate on what 
> exactly you've done to get it working. We normally do not run much of 
> projects requiring MPI but still may need it in the future.
>
> Thanks,
> Ivan
>
> On Thursday 09 August 2007 06:23:43 pm Brian R. Smith wrote:
>   
>> Got it.  It was just a problem with a boolean complex value not being
>> addressed in the queue configuration.  Everything is working fine now.  
>> Thanks for your time.
>>
>> -Brian
>>
>> Brian R. Smith wrote:
>>     
>>> Andreas & Reuti,
>>>
>>> No, there is no load threshold defined for that queue and there are no
>>> other jobs running on the host.  The load is at 0.00.  Is there any
>>> other possible information I can provide?
>>>
>>> Thanks for your help.
>>>
>>> -Brian
>>>       
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list