[GE users] One queue in a subordinated cluster queue not suspending

Daniel Templeton Dan.Templeton at Sun.COM
Thu Dec 6 05:27:58 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Tim,

Can you send your queue configurations (qconf -sq) for boinc and 
single_core?

Daniel

Tim Cera wrote:
> Hello,
>
> I have a subordinated cluster queue (boinc) and a production queue
> (single_core).  The correct queue within boinc suspends when a job runs
> in single_core EXCEPT for one queue.  I have tried so many things and I
> am starting to go around in circles - hence this e-mail and the hope
> that there is an answer out there.
>
>   
>> qstat
>>     
> job-ID  prior   name       user         state submit/start at     queue
>
> ------------------------------------------------------------------------
> --------
>    3867 0.56000 node01     tcera        r     12/05/2007 21:25:00
> boinc at node01  
>    3868 0.56000 node02     tcera        r     12/05/2007 21:25:00
> boinc at node02
>    3869 0.56000 node03     tcera        r     12/05/2007 21:25:00
> boinc at node03
>    3870 0.56000 node04     tcera        r     12/05/2007 21:25:00
> boinc at node04
>    3871 0.56000 node05     tcera        r     12/05/2007 21:25:00
> boinc at node05
>    3872 0.56000 node06     tcera        r     12/05/2007 21:25:00
> boinc at node06
>    3873 0.56000 node07     tcera        r     12/05/2007 21:25:00
> boinc at node07
>    3874 0.56000 node08     tcera        r     12/05/2007 21:25:00
> boinc at node08
>
> Lets add some load jobs to the the single_core cluster queue (nodes 1
> through nodes 4)...
>
>   
>> qstat
>>     
> job-ID  prior   name       user         state submit/start at     queue
> ------------------------------------------------------------------------
> ------------
>    3904 0.56000 load_scr.s tcera        r     12/05/2007 22:01:45
> single_core at node01
>    3908 0.56000 load_scr.s tcera        r     12/05/2007 22:02:00
> single_core at node01
>    3903 0.56000 load_scr.s tcera        r     12/05/2007 22:01:45
> single_core at node02
>    3907 0.56000 load_scr.s tcera        r     12/05/2007 22:02:00
> single_core at node02
>    3901 0.56000 load_scr.s tcera        r     12/05/2007 22:01:45
> single_core at node03
>    3905 0.56000 load_scr.s tcera        r     12/05/2007 22:02:00
> single_core at node03
>    3902 0.56000 load_scr.s tcera        r     12/05/2007 22:01:45
> single_core at node04
>    3906 0.56000 load_scr.s tcera        r     12/05/2007 22:02:00
> single_core at node04
>    3867 0.56000 node01     tcera        S     12/05/2007 21:25:00
> boinc at node01
>    3868 0.56000 node02     tcera        r     12/05/2007 21:25:00
> boinc at node02
>    3869 0.56000 node03     tcera        S     12/05/2007 21:25:00
> boinc at node03
>    3870 0.56000 node04     tcera        S     12/05/2007 21:25:00
> boinc at node04
>    3871 0.56000 node05     tcera        r     12/05/2007 21:25:00
> boinc at node05
>    3872 0.56000 node06     tcera        r     12/05/2007 21:25:00
> boinc at node06
>    3873 0.56000 node07     tcera        r     12/05/2007 21:25:00
> boinc at node07
>    3874 0.56000 node08     tcera        r     12/05/2007 21:25:00
> boinc at node08
>    3909 0.55929 load_scr.s tcera        qw    12/05/2007 22:01:50
>
> Note that every node in single_core has suspended EXCEPT node02.  All
> slots in single_core are filled with a job queued.
>
> Subordinate queue suspension works correctly for the dual_core queue
> (nodes 5 through 8).  It is ONLY node02 that doesn't suspend.
>
> Any ideas on what could be wrong?  No other indication that node02 has
> any problem with grid engine.  No errors in the messages files, and as
> near as I can tell it is configured identical to the other nodes.
>
> Kindest regards,
> Tim Cera, P.E.
> Senior Professional Engineer
> St. Johns River Water Management District
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list