[GE users] How to configure SGE

Yoshio Tanaka yoshio.tanaka at aist.go.jp
Fri Aug 4 01:03:19 BST 2006


Hi Reuti,

Thanks for the valuable comments.  We will continue to investigate how
to configure SGE to satisfy our requirements.

Best Regards,

--
Yoshio Tanaka (yoshio.tanaka at aist.go.jp)
http://ninf.apgrid.org/
http://www.apgridpma.org/


From: Reuti <reuti at staff.uni-marburg.de>
Subject: Re: [GE users] How to configure SGE
Date: Thu, 3 Aug 2006 21:06:33 +0200
Message-ID: <56AFB051-C2F9-4C30-8E6C-CBB9EE959D03 at staff.uni-marburg.de>

reuti> Hi,
reuti> 
reuti> Am 03.08.2006 um 17:18 schrieb Yoshio Tanaka:
reuti> 
reuti> >
reuti> > Hello Reuti,
reuti> >
reuti> > Thanks for your comments.  Let me give answers to your questions and
reuti> > comments.
reuti> >
reuti> > reuti> > Would someone give advices and/or recommendations on how  
reuti> > to configure
reuti> > reuti> > SGE to satisfy the following requirements?
reuti> > reuti> >
reuti> > reuti> > - Each user is able to execute at least 24 simultaneous  
reuti> > jobs if node
reuti> > reuti> >   is available.
reuti> > reuti> >
reuti> > reuti> > - Even if a user is executing 24 jobs, he/she is allowed  
reuti> > to execute
reuti> > reuti> >   more jobs if node is available.
reuti> > reuti>
reuti> > reuti> why just 24? Do you have enough nodes, that you could give  
reuti> > each user
reuti> > reuti> 24 machines/slots as his primary machines/slots, and later  
reuti> > on use a
reuti> > reuti> type of secondary queue for each one?
reuti> >
reuti> > Unfortunately, we only have 32 nodes shared by few users.  Each user
reuti> > may submit several hundreds of short-term (about 10 minutes) jobs.
reuti> > Therefore, we cannot give each user 24 machines/slots as his primary
reuti> > machines/slots.
reuti> 
reuti> okay.
reuti> 
reuti> > reuti> > - If a user (user A) is executing more than 24 jobs and  
reuti> > the other user
reuti> > reuti> >   (user B) submit a new job, user A's excessive jobs will  
reuti> > be killed
reuti> > reuti> >   and user B's jobs will be activated.  User A's killed  
reuti> > jobs will be
reuti> > reuti> >   re-submitted to the queue.
reuti> > reuti>
reuti> > reuti> To achieve this, you could combine a subordinate queue (to  
reuti> > suspend
reuti> > reuti> the jobs) with the checkpointing feature, where a suspend will
reuti> > reuti> reschedule a job. But it's not a good setup, if nodes are  
reuti> > hard-wired
reuti> > reuti> to users already as mentioned before.
reuti> >
reuti> > We actually considered to use a subordinate queue, but we did not
reuti> > choose this option since
reuti> > - we would not like to provide two queues for users,
reuti> > - in our understanding, all jobs in a subordinate queue will be
reuti> >   suspended if the the number of jobs submitted to the primary queue
reuti> >   will exceed the limit
reuti> 
reuti> Correct.
reuti> 
reuti> > reuti> A simple fair-share setup isn't working for you - are the jobs
reuti> > reuti> running a long time?
reuti> >
reuti> > Since each job is short term, fair share may be a good choice.
reuti> > However, does fair-share support suspesion and requeue?
reuti> 
reuti> Please have a look in the admin manual:
reuti> 
reuti> http://docs.sun.com/app/docs/doc/817-5677?a=load
reuti> 
reuti> page 133 for the setup. There will be no suspension or rescheduling,  
reuti> but as the jobs are short as you said, the just submitted user's B  
reuti> jobs will run if any of user's A jobs end (although he/she submitted  
reuti> hundreds of jobs before user's B ones). The idea is, that all users  
reuti> have the same number of jobs running in the cluster. You could only  
reuti> force this to happen by rescheduling a job by hand with qmod -rj.
reuti> 
reuti> One additonal hint: you could also limit the number of running jobs  
reuti> per user in the cluster by setting maxujobs in the "qconf -msconf" to  
reuti> 24 as an additonal limit, but then never more than 24 will run at the  
reuti> same time for each user (and nodes may be idling).
reuti> 
reuti> Cheers - Reuti
reuti> 
reuti> ---------------------------------------------------------------------
reuti> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
reuti> For additional commands, e-mail: users-help at gridengine.sunsource.net
reuti> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list