[GE users] How to configure SGE
reuti at staff.uni-marburg.de
Thu Aug 3 20:06:33 BST 2006
Am 03.08.2006 um 17:18 schrieb Yoshio Tanaka:
> Hello Reuti,
> Thanks for your comments. Let me give answers to your questions and
> reuti> > Would someone give advices and/or recommendations on how
> to configure
> reuti> > SGE to satisfy the following requirements?
> reuti> >
> reuti> > - Each user is able to execute at least 24 simultaneous
> jobs if node
> reuti> > is available.
> reuti> >
> reuti> > - Even if a user is executing 24 jobs, he/she is allowed
> to execute
> reuti> > more jobs if node is available.
> reuti> why just 24? Do you have enough nodes, that you could give
> each user
> reuti> 24 machines/slots as his primary machines/slots, and later
> on use a
> reuti> type of secondary queue for each one?
> Unfortunately, we only have 32 nodes shared by few users. Each user
> may submit several hundreds of short-term (about 10 minutes) jobs.
> Therefore, we cannot give each user 24 machines/slots as his primary
> reuti> > - If a user (user A) is executing more than 24 jobs and
> the other user
> reuti> > (user B) submit a new job, user A's excessive jobs will
> be killed
> reuti> > and user B's jobs will be activated. User A's killed
> jobs will be
> reuti> > re-submitted to the queue.
> reuti> To achieve this, you could combine a subordinate queue (to
> reuti> the jobs) with the checkpointing feature, where a suspend will
> reuti> reschedule a job. But it's not a good setup, if nodes are
> reuti> to users already as mentioned before.
> We actually considered to use a subordinate queue, but we did not
> choose this option since
> - we would not like to provide two queues for users,
> - in our understanding, all jobs in a subordinate queue will be
> suspended if the the number of jobs submitted to the primary queue
> will exceed the limit
> reuti> A simple fair-share setup isn't working for you - are the jobs
> reuti> running a long time?
> Since each job is short term, fair share may be a good choice.
> However, does fair-share support suspesion and requeue?
Please have a look in the admin manual:
page 133 for the setup. There will be no suspension or rescheduling,
but as the jobs are short as you said, the just submitted user's B
jobs will run if any of user's A jobs end (although he/she submitted
hundreds of jobs before user's B ones). The idea is, that all users
have the same number of jobs running in the cluster. You could only
force this to happen by rescheduling a job by hand with qmod -rj.
One additonal hint: you could also limit the number of running jobs
per user in the cluster by setting maxujobs in the "qconf -msconf" to
24 as an additonal limit, but then never more than 24 will run at the
same time for each user (and nodes may be idling).
Cheers - Reuti
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users