[GE users] contemporary of -l slots=#

Reuti reuti at staff.uni-marburg.de
Wed Mar 30 22:21:24 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Quoting "Raphael Y. Rubin" <rafi at cs.caltech.edu>:

> Ignoring the agrument of what's proper, is there an easy way to assign a
> PE to every queue, or a list of queues?  And is there an easy way to
> specify a complex value to each machine or queue?

qconf -aattr queue pe_list <myPE> <queue1> <queue2> ...

> 
> On Wed, Mar 30, 2005 at 10:30:13PM +0200, Reuti wrote:
> > Quoting "Raphael Y. Rubin" <rafi at cs.caltech.edu>:
> > 
> > > I guess I'm a little unclear as to what the "logic behind" is, I
> > > wouldn't mind a short explanation.
> > 
> > To me means "one slot = one job (task) to run".
> > 
> > > So are you saying the PE warning basically assumes the user intended to
> > > use a PE and simply forgot?  And creating a PE to specify the number of
> > 
> > Yes, requesting more slots is for parallel jobs. If I got you correctly,
> you 
> > want to run a serial job, but exclusive on one machine with no other job on
> it.
> > 
> > > slots to reserve is bad, at least if there's no other driving purpose?
> > 
> > I said: personal taste - it's not bad per se, but I wouldn't like it. You
> will 
> > request a PE for a serial job.
> 
> I don't like it either.  But it does seem the easiest way to give a
> serial job a full machine of a given class.
> 
> As I said, sometimes our users just want to see what sort of performance
> they can get.  This is not our typical usage, but it comes up often
> enough.  And we don't want to have to pull a machine from the grid for
> this sort of thing.
> 
> Using memory or other constraints for this also seems like a bit of a
> lie.  And also has the down side of lack of safety.  A user can specify
> a job with -l vf=0, and it will freely run, more over it will even
> target a machine with a job using vf=max because the scheduler will see
> with n-1 free slots.

Correct - there is no way to prevent this, but if I see such things in my 
clusters, the users doing this will be disciplined.

> 
> > > 
> > > I'm not so sure "virtual_free" is a safe alternative.  It seems to vary
> > > too wildly, even on unloaded systems.
> > 
> > When you set it for each exec host (qconf -me <host>), it will be exactly
> this 
> > value. And if you request exactly this value you get it.
> 
> I don't know for sure, but doesn't virtual_free include actual system
> utilization and reserved resources?  It seems that overriding the
> utilization reduces the uility of this variable in general.

Yes - no. According to:

http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgId=24016

both are checked, if you make virtual_free consumable. The calculated one is 
still virtual_total-virtual_used. In qconf -se <hostname> you will see two 
virtual_free values. And if you have a little bit of scratch space defined, 
virtual_total will be higher than the defined virtual_free. If 
virtual_total-virtual_used is eating up too much without any running job, 
something is wrong anyway.

Another idea: limit the access to the node for the time of testing by using a 
SGE user list for some nodes.

CU - Reuti

> 
> >
> > > 
> > > Furthermore it doesn't seem to garuntee grid exclusivity (for the
> host),
> > > which is what
> > > our users occassionally need.
> > 
> > What do you mean grid exclusivity? One job in the whole cluster?
> 
> One job gets a machine completely to itself (as far as the grid is
> concerned).
> > 
> > CU - Reuti
> > 
> > > 
> > > 
> > > I guess this gets back to the question of intent?  Is the slots thing
> > > specifically discouraged because there is an assumption that users
> > > shouldn't have the right to monopolize a machine?  Or is there
> something
> > > else going on?
> > > 
> > > Rafi
> > > 
> > > On Wed, Mar 30, 2005 at 09:53:37PM +0200, Reuti wrote:
> > > > Hi Raphael,
> > > > 
> > > > it's personal taste, but I wouldn't use any of the two options you
> offered
> > > - 
> > > > both are not refelecting the logic behind. Although: what I suggest
> is
> > > similar 
> > > > to the second:
> > > > 
> > > > - make the complex "virtual_free" consumable and requestable -
> default
> > > 1GB
> > > >                                                        (or what you
> like)
> > > > 
> > > > - attach this to each node with a value of the built in memory
> > > >                                     (complex_values  
> virtual_free=3.5GB)
> > > > 
> > > > - request 3.5GB in your qsub command -> single job on the node
> > > > 
> > > > 
> > > > With 4GB built in only 3.5GB are usable I think. Yes, it's nearly the
> same
> > > as 
> > > > your vslots, but this can also be used for a real request of just 2GB
> for
> > > 
> > > > larger jobs.
> > > > 
> > > > Cheers - Reuti
> > > > 
> > > > PS: IMO it's good to disallow the request for slots, to remind users
> to
> > > request 
> > > > a PE - maybe they forgot it by accident.
> > > > 
> > > > 
> > > > Quoting "Raphael Y. Rubin" <rafi at cs.caltech.edu>:
> > > > 
> > > > > I would like to configure my grid so that mortal users can grab
> > > exclusive
> > > > > access to a machine, using the normal submittion commands with
> little
> > > extra
> > > > > work.
> > > > > 
> > > > > Occassionally, a user wants to run a job exclusively for
> benchmarking,
> > > > > whether that's to test a class of machine or a specific job.
> > > > > 
> > > > > Also some jobs we know will be resource hogs, and we'd like to
> annotate
> > > them
> > > > > to indicate they are the equivalent of two or more normal jobs.
> > > > > 
> > > > > And of course there are various other nees that arrise, but the above
> two
> > > are
> > > > > the most common and important.  In the past we just use to specify
> "-l
> > > > > slots=n".  But as of sge5.3 that was discouraged.
> > > > > 
> > > > > 	error: denied: use parallel environments instead of requesting
> slots
> > > > > explicitly
> > > > > 	
> > > > > 		- from sge6
> > > > > 
> > > > > In sge 5.3, I had created a slots pe, after we first noticed the
> messages
> > > 
> > > > > about -l slots.  Here is an updated version of that pe (in a form for
> sge
> > > 
> > > > > 6).
> > > > > 
> > > > > pe_name           slots
> > > > > slots             999
> > > > > user_lists        NONE
> > > > > xuser_lists       NONE
> > > > > start_proc_args   /bin/true
> > > > > stop_proc_args    /bin/true
> > > > > allocation_rule   $pe_slots
> > > > > control_slaves    FALSE
> > > > > job_is_first_task TRUE
> > > > > urgency_slots     min
> > > > > 
> > > > > Alternatively, one can use a consumable complex, as described  in:
> > > > > 
> > > >
> > > 
> >
> 
http://gridengine.sunsource.net/servlets/BrowseList?list=users&by=thread&from=2
> > > > 530
> > > > > 
> > > > > or more simply:
> > > > > #name               shortcut   type        relop requestable
> consumable
> > > 
> > > > > default  urgency
> > > > > 
> > > >
> > > 
> >
> 
#------------------------------------------------------------------------------
> > > > ----------
> > > > > vslots              vs         INT         <=    YES         YES     
>  
> > > 1
> > > > > 1000
> > > > > 
> > > > > Which is of course just the normal complex slots copied to a
> different 
> > > > > name to get around the explicit "-l slots" block.  Somehow this seems
> 
> > > > > wrong, a reincarnation of a technique deliberately killed for some
> reason
> > > 
> > > > > unknown to me.
> > > > > 
> > > > > 
> > > > > 
> > > > > Which style is prefered and why?
> > > > > What are the ramifications?
> > > > > Are there any behavioral differences?
> > > > > 
> > > > > 
> > > > > As for the prefered option, any suggestions to improve the above
> > > > > configurations?
> > > > > Also what's the best way to deploy, either globally, or to a set of
> > > queues?
> > > > > 
> > > > > 
> > > > > I know with sge 5.3, I was able to use:
> > > > > queue_list        all
> > > > > To deploy to my whole cell.
> > > > > 
> > > > > 
> > > > > 
> > > > > Also on a not complete tangent.  Does anyone have advice, or has
> anyone
> > > 
> > > > > written guidelines to optimize configuration of queues?  We are
> mostly 
> > > > > using dual cpu xeons with hyperthreading and 4G of ram.
> > > > > 
> > > > > Our jobs are mostly java, c, and lisp, single threaded (except the
> jvm 
> > > > > which forks its own stuff).  Jobs mostly run in a few hundred MB or
> less,
> > > 
> > > > > with a occasional memory hog which will eat a gig or two.
> > > > > 
> > > > > 
> > > > > Rafi Rubin
> > > > > California Institute of Technology
> > > > > 
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > > > For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > > 
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list