[GE users] contemporary of -l slots=#

Raphael Y. Rubin rafi at cs.caltech.edu
Wed Mar 30 22:47:14 BST 2005


On Wed, Mar 30, 2005 at 11:21:24PM +0200, Reuti wrote:
> Quoting "Raphael Y. Rubin" <rafi at cs.caltech.edu>:
> 
> > Ignoring the agrument of what's proper, is there an easy way to assign a
> > PE to every queue, or a list of queues?  And is there an easy way to
> > specify a complex value to each machine or queue?
> 
> qconf -aattr queue pe_list <myPE> <queue1> <queue2> ...

very nice

> > 
> > On Wed, Mar 30, 2005 at 10:30:13PM +0200, Reuti wrote:
> > > Quoting "Raphael Y. Rubin" <rafi at cs.caltech.edu>:
> > > 
> > > > I guess I'm a little unclear as to what the "logic behind" is, I
> > > > wouldn't mind a short explanation.
> > > 
> > > To me means "one slot = one job (task) to run".
> > > 
> > > > So are you saying the PE warning basically assumes the user intended to
> > > > use a PE and simply forgot?  And creating a PE to specify the number of
> > > 
> > > Yes, requesting more slots is for parallel jobs. If I got you correctly,
> > you 
> > > want to run a serial job, but exclusive on one machine with no other job on
> > it.
> > > 
> > > > slots to reserve is bad, at least if there's no other driving purpose?
> > > 
> > > I said: personal taste - it's not bad per se, but I wouldn't like it. You
> > will 
> > > request a PE for a serial job.
> > 
> > I don't like it either.  But it does seem the easiest way to give a
> > serial job a full machine of a given class.
> > 
> > As I said, sometimes our users just want to see what sort of performance
> > they can get.  This is not our typical usage, but it comes up often
> > enough.  And we don't want to have to pull a machine from the grid for
> > this sort of thing.
> > 
> > Using memory or other constraints for this also seems like a bit of a
> > lie.  And also has the down side of lack of safety.  A user can specify
> > a job with -l vf=0, and it will freely run, more over it will even
> > target a machine with a job using vf=max because the scheduler will see
> > with n-1 free slots.
> 
> Correct - there is no way to prevent this, but if I see such things in my 
> clusters, the users doing this will be disciplined.

A fair point.

> > 
> > > > 
> > > > I'm not so sure "virtual_free" is a safe alternative.  It seems to vary
> > > > too wildly, even on unloaded systems.
> > > 
> > > When you set it for each exec host (qconf -me <host>), it will be exactly
> > this 
> > > value. And if you request exactly this value you get it.
> > 
> > I don't know for sure, but doesn't virtual_free include actual system
> > utilization and reserved resources?  It seems that overriding the
> > utilization reduces the uility of this variable in general.
> 
> Yes - no. According to:
> 
> http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgId=24016
> 
> both are checked, if you make virtual_free consumable. The calculated one is 
> still virtual_total-virtual_used. In qconf -se <hostname> you will see two 
> virtual_free values. And if you have a little bit of scratch space defined, 
> virtual_total will be higher than the defined virtual_free. If 
> virtual_total-virtual_used is eating up too much without any running job, 
> something is wrong anyway.
> 
> Another idea: limit the access to the node for the time of testing by using a 
> SGE user list for some nodes.

That's part of what we're trying to avoid.  Some of our jobs will run
for days or weeks (or worse).  While most run in minutes or hours, its
just a pain to pay attention to see when a node has finally cleared its
jobs before and after a special run.  And I don't want to have to be
involved each time a mortal user wants to do this sort of stuff.

Besides that's part of why we are using a "scheduler".
> 
> CU - Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list