[GE users] actively draining a queue to allow big parallel job ....

SLIM H.A. h.a.slim at durham.ac.uk
Thu Jul 12 17:35:43 BST 2007


Dear Reuti 

I read from the reply to a previous question that checkpointing with the
migr_command will not suspend a job:

> As you would like the normal jobs to be checkpointed instead 
> of suspended, you could setup a checkpointing environment in 
> SGE with "application-level" interface. The to be defined 
> "migr_command"- script  in this setup (as it's aware which 
> job it belongs to) can easily write the necessary stop-file. 
> So all small jobs have to specify to run with this 
> checkpointing environment.
> 
> http://gridengine.sunsource.net/howto/checkpointing.html 
> section "The application-level interface".
> 
> Be aware, that SGE will in this case neither kills the normal 
> job, nor suspends it. This is up to your script now! The 

The last paragraph seems to contradict to what the HowTo page says about

The application-level interface:

", the "migr_command" procedure will be executed if you suspend the job
or the queue which the job is running in "

This suggests that the migr_command is executed _after_ the job is
suspended (by the user or gridengine). 
Is suspending always done by sending the STOP signal (17)?

Thanks

Henk
 

> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de] 
> Sent: 24 June 2007 16:20
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] actively draining a queue to allow 
> big parallel job ....
> 
> Lydia,
> 
> checkpointing a parallel application is always tricky, but if 
> your application support this, it's great.
> 
> But instead of writing a cron job, I would suggest of having 
> two parallel queues: one for the normal jobs normal.q, one 
> for the big ones big.q (which should start immediately). The 
> normal.q is subordinated to big.q, hence will be suspended if 
> the big job starts to run.
> 
> As you would like the normal jobs to be checkpointed instead 
> of suspended, you could setup a checkpointing environment in 
> SGE with "application-level" interface. The to be defined 
> "migr_command"- script  in this setup (as it's aware which 
> job it belongs to) can easily write the necessary stop-file. 
> So all small jobs have to specify to run with this 
> checkpointing environment.
> 
> http://gridengine.sunsource.net/howto/checkpointing.html 
> section "The application-level interface".
> 
> Be aware, that SGE will in this case neither kills the normal 
> job, nor suspends it. This is up to your script now! The 
> Howto assumes to run the jobs local on a node, so all interim 
> files must be copied to a common checkpoint directory to 
> migrate, and reused when the job starts again. Hence to be 
> copied from this shared location to a different node's 
> $TMPDIR. If your aplication is using a shared CWD anyway, 
> this might not be necessary.
> 
> If you have more than one core per node, this might lead to 
> the situation, that too many jobs are stopped first. After 
> the big jobs started to run, some of these stopped smaller 
> jobs might start in the cluster again with a different 
> distribution schema. This depends of the allocation rule of 
> the PE for the normal and big jobs. Maybe it would be good, 
> to have always a fixed allocation rule like 2 or 4 (or at 
> least $fill_up).
> 
> -- Reuti
> 
> 
> Am 24.06.2007 um 16:27 schrieb Lydia Heck:
> 
> > I am investigating a way to allow big parallel jobs (say 
> 128 cpu jobs) 
> > a sensible chance to get time on a cluster which only runs parallel 
> > jobs.
> >
> > I would like to avoid that half the cluster is empty when 
> queues are 
> > drained of existing smaller runs. I would like to avoid to 
> kill jobs 
> > in mid- flow, before they can sensibly stop, and lose hundreds of 
> > hours of cpu time.
> >
> > What I would like to do is the following - and maybe somebody out 
> > there has already done something similar. In order to 
> understand the 
> > story better here is a short introduction to they type of 
> codes which 
> > are run:
> >
> > The codes are "checkpointing" frequently by writing re-start files, 
> > where the frequency is determined by the user in an input file.
> >
> > The codes can be stopped by creating an empty file in a specific 
> > directory, of which the name is predefined either in the 
> code or in an 
> > input file.
> >
> > What I would like to do is to check periodically using a 
> cron job, if 
> > a "big" job has been submitted. If a big job has been submitted I 
> > assume that one parameter - -pe big 128  has been set. I 
> then prepare 
> > the parallel environment to have 128 cpus, subtracting the 
> number of 
> > cpus from the "normal"
> > parallel
> > environment. So far so good!.
> >
> > Then I would like to "send" a signal to a set of running 
> jobs, with a 
> > total of
> >> = 128 cpus, to write their restart files and terminate sensibly.
> >
> > There is of course the way to find out from where the job has been 
> > started, to tell the users to make sure that their program 
> tests for 
> > the "stop" file to be listed in the CWD directory. But that is 
> > somewhat fraught with problems. One can easily envisage 
> that two jobs 
> > are started from that directory and both jobs would see the 
> stop file. 
> > So the user would have to make sure that a unique stop file 
> is looked 
> > for, which could of course depend on the  PID of the master 
> process in 
> > an MPI job.
> > Again there is problem, as the PID on one system is unique, 
> but with 
> > hundreds of systems the same PID for two different jobs 
> could happen. 
> > So the user would have to test it against the JOB_ID from 
> grid engine 
> > if that is possible.
> >
> > It would be neater, if a sig handling call could be 
> introduced to the 
> > codes as a matter of course. However the signal would have to be
> > transported: A simple
> > qdel would not be possible as that kills the job outright.
> >
> > If anybody has thought of a scenario like this, and was prepared to 
> > share there solutions or attempts to it, I would be 
> grateful to hear 
> > from them.
> >
> >
> >
> > ------------------------------------------
> > Dr E L  Heck
> >
> > University of Durham
> > Institute for Computational Cosmology
> > Ogden Centre
> > Department of Physics
> > South Road
> >
> > DURHAM, DH1 3LE
> > United Kingdom
> >
> > e-mail: lydia.heck at durham.ac.uk
> >
> > Tel.: + 44 191 - 334 3628
> > Fax.: + 44 191 - 334 3645
> > ___________________________________________
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list