[GE users] job suspension

Viktor Oudovenko udo at physics.rutgers.edu
Thu May 22 07:24:34 BST 2008


Hello to everybody, 
 
any ideas why job suspension does not work?
 
I have SGE 6.0u4 running on dual Athlon server.
Job is parallel (tight integration).
Queue status correctly changes to "S" but job continue to run (so both jobs
continue to run).
plz see below:
 
 185157 5.00400 mpi_p1       user1        r     05/22/2008 02:06:11
wparallel1 at sub04n103              64        
 185081 2.01388 mpi_p2       user2        S     05/21/2008 22:13:02
wparallel1_lp at sub04n103           64        

 
So , queue with "wparallel3_lp" (low priority) is defined as subordinated
queue of  wparallel3.
 
it seems to me when I created queue "_lp" and tested  job suspension under
my account is worked on x86 architecture and did not work on opterons but
now it does not work even on x86 machines.
 
I found info in the net that 6.0u4 does have bug that after sgemaster
restart jobs are not suspended but I have not restarted the master rather
only computed nodes.
 
If any questions plz let me know.
any ideas are welcome.
 
best,
vic
p.s.
CLUSTER QUEUE    CQLOAD   USED  AVAIL  TOTAL aoACDS  cdsuE  
----------------------------------------------------------------------------
--- 
wparallel1                          1.82     64      0     64      0      0 
wparallel1_lp                     1.82     64      0     64     64      0 

 
 



More information about the gridengine-users mailing list