[GE users] job suspension

Viktor Oudovenko udo at physics.rutgers.edu
Thu May 22 16:04:11 BST 2008


Hi, Reuti,
 
It seems it does sends something as all processes on head machine of
suspended job get status "T" instead of original "S" plz see attached file.
I used simply "ps  -axuf" command to get it last night.

Was it helpful?

Regards,
v

> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de] 
> Sent: Thursday, May 22, 2008 4:15
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] job suspension
> 
> Hi,
> 
> Am 22.05.2008 um 08:24 schrieb Viktor Oudovenko:
> 
> > Hello to everybody,
> >
> > any ideas why job suspension does not work?
> >
> > I have SGE 6.0u4 running on dual Athlon server.
> > Job is parallel (tight integration).
> > Queue status correctly changes to "S" but job continue to 
> run (so both 
> > jobs continue to run).
> > plz see below:
> >
> >  185157 5.00400 mpi_p1       user1        r     05/22/2008 
> 02:06:11  
> > wparallel1 at sub04n103              64
> >  185081 2.01388 mpi_p2       user2        S     05/21/2008 
> 22:13:02  
> > wparallel1_lp at sub04n103           64
> >
> > So , queue with "wparallel3_lp" (low priority) is defined as 
> > subordinated queue of  wparallel3.
> >
> > it seems to me when I created queue "_lp" and tested  job 
> suspension 
> > under my account is worked on x86 architecture and did not work on 
> > opterons but now it does not work even on x86 machines.
> >
> > I found info in the net that 6.0u4 does have bug that after 
> sgemaster 
> > restart jobs are not suspended but I have not restarted the master 
> > rather only computed nodes.
> 
> SGE will send a -sigstop to the complete processgroup of the 
> job. So please check, wether it's in the correct group.
> 
> ps -e f -o pid,ppid,pgrp,command
> 
> (f w/o -). - Reuti
> 
> > If any questions plz let me know.
> > any ideas are welcome.
> >
> > best,
> > vic
> > p.s.
> > CLUSTER QUEUE    CQLOAD   USED  AVAIL  TOTAL aoACDS  cdsuE
> > 
> ----------------------------------------------------------------------
> > ---------
> > wparallel1                          1.82     64      0     64       
> > 0      0
> > wparallel1_lp                     1.82     64      0     64      
> > 64      0
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


    [ Part 2, Text/PLAIN (Name: "PS.txt") ~1,193 lines. ]
    [ Unable to print this part. ]


    [ Part 3: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list