[GE users] How to configure SGE
yoshio.tanaka at aist.go.jp
Fri Aug 4 01:03:19 BST 2006
Thanks for the valuable comments. We will continue to investigate how
to configure SGE to satisfy our requirements.
Yoshio Tanaka (yoshio.tanaka at aist.go.jp)
From: Reuti <reuti at staff.uni-marburg.de>
Subject: Re: [GE users] How to configure SGE
Date: Thu, 3 Aug 2006 21:06:33 +0200
Message-ID: <56AFB051-C2F9-4C30-8E6C-CBB9EE959D03 at staff.uni-marburg.de>
reuti> Am 03.08.2006 um 17:18 schrieb Yoshio Tanaka:
reuti> > Hello Reuti,
reuti> > Thanks for your comments. Let me give answers to your questions and
reuti> > comments.
reuti> > reuti> > Would someone give advices and/or recommendations on how
reuti> > to configure
reuti> > reuti> > SGE to satisfy the following requirements?
reuti> > reuti> >
reuti> > reuti> > - Each user is able to execute at least 24 simultaneous
reuti> > jobs if node
reuti> > reuti> > is available.
reuti> > reuti> >
reuti> > reuti> > - Even if a user is executing 24 jobs, he/she is allowed
reuti> > to execute
reuti> > reuti> > more jobs if node is available.
reuti> > reuti>
reuti> > reuti> why just 24? Do you have enough nodes, that you could give
reuti> > each user
reuti> > reuti> 24 machines/slots as his primary machines/slots, and later
reuti> > on use a
reuti> > reuti> type of secondary queue for each one?
reuti> > Unfortunately, we only have 32 nodes shared by few users. Each user
reuti> > may submit several hundreds of short-term (about 10 minutes) jobs.
reuti> > Therefore, we cannot give each user 24 machines/slots as his primary
reuti> > machines/slots.
reuti> > reuti> > - If a user (user A) is executing more than 24 jobs and
reuti> > the other user
reuti> > reuti> > (user B) submit a new job, user A's excessive jobs will
reuti> > be killed
reuti> > reuti> > and user B's jobs will be activated. User A's killed
reuti> > jobs will be
reuti> > reuti> > re-submitted to the queue.
reuti> > reuti>
reuti> > reuti> To achieve this, you could combine a subordinate queue (to
reuti> > suspend
reuti> > reuti> the jobs) with the checkpointing feature, where a suspend will
reuti> > reuti> reschedule a job. But it's not a good setup, if nodes are
reuti> > hard-wired
reuti> > reuti> to users already as mentioned before.
reuti> > We actually considered to use a subordinate queue, but we did not
reuti> > choose this option since
reuti> > - we would not like to provide two queues for users,
reuti> > - in our understanding, all jobs in a subordinate queue will be
reuti> > suspended if the the number of jobs submitted to the primary queue
reuti> > will exceed the limit
reuti> > reuti> A simple fair-share setup isn't working for you - are the jobs
reuti> > reuti> running a long time?
reuti> > Since each job is short term, fair share may be a good choice.
reuti> > However, does fair-share support suspesion and requeue?
reuti> Please have a look in the admin manual:
reuti> page 133 for the setup. There will be no suspension or rescheduling,
reuti> but as the jobs are short as you said, the just submitted user's B
reuti> jobs will run if any of user's A jobs end (although he/she submitted
reuti> hundreds of jobs before user's B ones). The idea is, that all users
reuti> have the same number of jobs running in the cluster. You could only
reuti> force this to happen by rescheduling a job by hand with qmod -rj.
reuti> One additonal hint: you could also limit the number of running jobs
reuti> per user in the cluster by setting maxujobs in the "qconf -msconf" to
reuti> 24 as an additonal limit, but then never more than 24 will run at the
reuti> same time for each user (and nodes may be idling).
reuti> Cheers - Reuti
reuti> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
reuti> For additional commands, e-mail: users-help at gridengine.sunsource.net
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users