[GE users] How to configure SGE

Yoshio Tanaka yoshio.tanaka at aist.go.jp
Thu Aug 3 16:18:21 BST 2006


Hello Reuti,

Thanks for your comments.  Let me give answers to your questions and
comments.

reuti> > Would someone give advices and/or recommendations on how to configure
reuti> > SGE to satisfy the following requirements?
reuti> >
reuti> > - Each user is able to execute at least 24 simultaneous jobs if node
reuti> >   is available.
reuti> >
reuti> > - Even if a user is executing 24 jobs, he/she is allowed to execute
reuti> >   more jobs if node is available.
reuti> 
reuti> why just 24? Do you have enough nodes, that you could give each user  
reuti> 24 machines/slots as his primary machines/slots, and later on use a  
reuti> type of secondary queue for each one?

Unfortunately, we only have 32 nodes shared by few users.  Each user
may submit several hundreds of short-term (about 10 minutes) jobs.
Therefore, we cannot give each user 24 machines/slots as his primary
machines/slots.

reuti> > - If a user (user A) is executing more than 24 jobs and the other user
reuti> >   (user B) submit a new job, user A's excessive jobs will be killed
reuti> >   and user B's jobs will be activated.  User A's killed jobs will be
reuti> >   re-submitted to the queue.
reuti> 
reuti> To achieve this, you could combine a subordinate queue (to suspend  
reuti> the jobs) with the checkpointing feature, where a suspend will  
reuti> reschedule a job. But it's not a good setup, if nodes are hard-wired  
reuti> to users already as mentioned before.

We actually considered to use a subordinate queue, but we did not
choose this option since
- we would not like to provide two queues for users,
- in our understanding, all jobs in a subordinate queue will be
  suspended if the the number of jobs submitted to the primary queue
  will exceed the limit

reuti> A simple fair-share setup isn't working for you - are the jobs  
reuti> running a long time?

Since each job is short term, fair share may be a good choice.
However, does fair-share support suspesion and requeue?

Thanks,

--
Yoshio Tanaka (yoshio.tanaka at aist.go.jp)
http://ninf.apgrid.org/
http://www.apgridpma.org/


reuti> 
reuti> -- Reuti
reuti> 
reuti> ---------------------------------------------------------------------
reuti> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
reuti> For additional commands, e-mail: users-help at gridengine.sunsource.net
reuti> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list