[GE users] Incorrect queue suspension

Reuti reuti at staff.uni-marburg.de
Wed Nov 7 11:43:00 GMT 2007


Hi,

Am 06.11.2007 um 15:07 schrieb Chris Rudge:

> I've seen on a couple of occasions an issue with suspension of
> subordinate queues. I have a default queue for serial and openmp jobs
> and a separate mpi queue. They are subordinates or each other with
> suspension occurring as soon as one slot is used in a queue
>
> the default queue has
> 	subordinate_list      mpi.q=1
> and the mpi q has
> 	subordinate_list      default.q=1
>
> This works correctly almost all of the time. However I've seen  
> occasions
> where the queue instances for both queues are suspended on a node. It
> appears that SGE has attempted to launch jobs in both queues
> simultaneously which results in both queues then being put into the
> suspended state due to being subordinates. This occurred earlier  
> today.
>
> # qstat -qs S
> job-ID  prior   name       user         state submit/start at      
> queue            slots ja-task-ID
> ---------------------------------------------------------------------- 
> -----------------------------
>  869047 0.57323 RunHunter. rgw          S     11/06/2007 13:21:36  
> default.q at comp60     4
>  869048 0.54696 RunHunter. rgw          S     11/06/2007 13:21:36  
> default.q at comp63     4
>  869057 0.72422 scatter.sh sn85         S     11/06/2007 13:21:36  
> mpi.q at comp63         8
>
> Further investigation shows that as the two default.q jobs had a
> slightly higher priority they must have been launched before the mpi.q
> job so their processes were suspended but the mpi.q job's processes  
> were
> running normally.
>
> Is this a known issue?

maybe related to http://gridengine.sunsource.net/issues/show_bug.cgi? 
id=437 ?

> Is it fixed in 6.1 (currently using 6.0u9) or is
> their a workaround?

What is your intention with this setting? If you want to limit the  
number of used slots per node in total, you could also set the number  
of slots in the exechosts definition or use resource quotas.

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list