[GE users] job suspension

Ravi Chandra Nallan Ravichandra.Nallan at Sun.COM
Thu May 22 07:40:38 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,
    When the situation occurs, was the job into the _lp queue recently 
submitted?
I want to know the time between the submission of job in _lp queue and 
the job submission in wparallel3, coz there was a race condition when 
the job on _lp is just starting up and the _lp queue is suspended 
because of job submitted in wparallel3.

check this:
http://gridengine.sunsource.net/issues/show_bug.cgi?id=2478

regards,
~Ravi

Viktor Oudovenko wrote:
> Hello to everybody,
>  
> any ideas why job suspension does not work?
>  
> I have SGE 6.0u4 running on dual Athlon server.
> Job is parallel (tight integration).
> Queue status correctly changes to "S" but job continue to run (so both 
> jobs continue to run).
> plz see below:
>  
>  185157 5.00400 mpi_p1       user1        r     05/22/2008 02:06:11 
> wparallel1 at sub04n103 <mailto:wparallel1 at sub04n103>              64       
>  185081 2.01388 mpi_p2       user2        S     05/21/2008 22:13:02 
> wparallel1_lp at sub04n103 <mailto:wparallel1_lp at sub04n103>           
> 64       
>  
> So , queue with "wparallel3_lp" (low priority) is defined as 
> subordinated queue of  wparallel3.
>  
> it seems to me when I created queue "_lp" and tested  job suspension 
> under my account is worked on x86 architecture and did not work on 
> opterons but now it does not work even on x86 machines.
>  
> I found info in the net that 6.0u4 does have bug that after sgemaster 
> restart jobs are not suspended but I have not restarted the master 
> rather only computed nodes.
>  
> If any questions plz let me know.
> any ideas are welcome.
>  
> best,
> vic
> p.s.
> CLUSTER QUEUE    CQLOAD   USED  AVAIL  TOTAL aoACDS  cdsuE 
> ------------------------------------------------------------------------------- 
> wparallel1                          1.82     64      0     64      
> 0      0
> wparallel1_lp                     1.82     64      0     64     64      0
>  
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list