[GE users] actively draining a queue to allow big parallel job ....

Lydia Heck lydia.heck at durham.ac.uk
Sun Jun 24 15:27:27 BST 2007

I am investigating a way to allow big parallel jobs (say 128 cpu jobs)
a sensible chance to get time on a cluster which only runs parallel jobs.

I would like to avoid that half the cluster is empty when queues are drained
of existing smaller runs. I would like to avoid to kill jobs in mid-flow,
before they can sensibly stop, and lose hundreds of hours of cpu time.

What I would like to do is the following - and maybe somebody out there has
already done something similar. In order to understand the story better here
is a short introduction to they type of codes which are run:

The codes are "checkpointing" frequently by writing re-start files, where
the frequency is determined by the user in an input file.

The codes can be stopped by creating an empty file in a specific directory,
of which the name is predefined either in the code or in an input file.

What I would like to do is to check periodically using a cron job, if a
"big" job has been submitted. If a big job has been submitted I assume that one
parameter - -pe big 128  has been set. I then prepare the parallel environment
to have 128 cpus, subtracting the number of cpus from the "normal" parallel
environment. So far so good!.

Then I would like to "send" a signal to a set of running jobs, with a total of
>= 128 cpus, to write their restart files and terminate sensibly.

There is of course the way to find out from where the job has been started,
to tell the users to make sure that their program tests for the "stop" file
to be listed in the CWD directory. But that is somewhat fraught with
problems. One can easily envisage that two jobs are started from that directory
and both jobs would see the stop file. So the user would have to make sure
that a unique stop file is looked for, which could of course depend on the  PID
of the master process in an MPI job.
Again there is problem, as the PID on one system is unique, but with hundreds
of systems the same PID for two different jobs could happen. So the user
would have to test it against the JOB_ID from grid engine if that is possible.

It would be neater, if a sig handling call could be introduced to the codes as
a matter of course. However the signal would have to be transported: A simple
qdel would not be possible as that kills the job outright.

If anybody has thought of a scenario like this, and was prepared to share
there solutions or attempts to it, I would be grateful to hear from them.

Dr E L  Heck

University of Durham
Institute for Computational Cosmology
Ogden Centre
Department of Physics
South Road

United Kingdom

e-mail: lydia.heck at durham.ac.uk

Tel.: + 44 191 - 334 3628
Fax.: + 44 191 - 334 3645

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list