[GE users] contemporary of -l slots=#

Raphael Y. Rubin rafi at cs.caltech.edu
Wed Mar 30 21:56:02 BST 2005


Ignoring the agrument of what's proper, is there an easy way to assign a
PE to every queue, or a list of queues?  And is there an easy way to
specify a complex value to each machine or queue?

On Wed, Mar 30, 2005 at 10:30:13PM +0200, Reuti wrote:
> Quoting "Raphael Y. Rubin" <rafi at cs.caltech.edu>:
> 
> > I guess I'm a little unclear as to what the "logic behind" is, I
> > wouldn't mind a short explanation.
> 
> To me means "one slot = one job (task) to run".
> 
> > So are you saying the PE warning basically assumes the user intended to
> > use a PE and simply forgot?  And creating a PE to specify the number of
> 
> Yes, requesting more slots is for parallel jobs. If I got you correctly, you 
> want to run a serial job, but exclusive on one machine with no other job on it.
> 
> > slots to reserve is bad, at least if there's no other driving purpose?
> 
> I said: personal taste - it's not bad per se, but I wouldn't like it. You will 
> request a PE for a serial job.

I don't like it either.  But it does seem the easiest way to give a
serial job a full machine of a given class.

As I said, sometimes our users just want to see what sort of performance
they can get.  This is not our typical usage, but it comes up often
enough.  And we don't want to have to pull a machine from the grid for
this sort of thing.

Using memory or other constraints for this also seems like a bit of a
lie.  And also has the down side of lack of safety.  A user can specify
a job with -l vf=0, and it will freely run, more over it will even
target a machine with a job using vf=max because the scheduler will see
with n-1 free slots.

> > 
> > I'm not so sure "virtual_free" is a safe alternative.  It seems to vary
> > too wildly, even on unloaded systems.
> 
> When you set it for each exec host (qconf -me <host>), it will be exactly this 
> value. And if you request exactly this value you get it.

I don't know for sure, but doesn't virtual_free include actual system
utilization and reserved resources?  It seems that overriding the
utilization reduces the uility of this variable in general.

>
> > 
> > Furthermore it doesn't seem to garuntee grid exclusivity (for the host),
> > which is what
> > our users occassionally need.
> 
> What do you mean grid exclusivity? One job in the whole cluster?

One job gets a machine completely to itself (as far as the grid is
concerned).
> 
> CU - Reuti
> 
> > 
> > 
> > I guess this gets back to the question of intent?  Is the slots thing
> > specifically discouraged because there is an assumption that users
> > shouldn't have the right to monopolize a machine?  Or is there something
> > else going on?
> > 
> > Rafi
> > 
> > On Wed, Mar 30, 2005 at 09:53:37PM +0200, Reuti wrote:
> > > Hi Raphael,
> > > 
> > > it's personal taste, but I wouldn't use any of the two options you offered
> > - 
> > > both are not refelecting the logic behind. Although: what I suggest is
> > similar 
> > > to the second:
> > > 
> > > - make the complex "virtual_free" consumable and requestable - default
> > 1GB
> > >                                                        (or what you like)
> > > 
> > > - attach this to each node with a value of the built in memory
> > >                                     (complex_values   virtual_free=3.5GB)
> > > 
> > > - request 3.5GB in your qsub command -> single job on the node
> > > 
> > > 
> > > With 4GB built in only 3.5GB are usable I think. Yes, it's nearly the same
> > as 
> > > your vslots, but this can also be used for a real request of just 2GB for
> > 
> > > larger jobs.
> > > 
> > > Cheers - Reuti
> > > 
> > > PS: IMO it's good to disallow the request for slots, to remind users to
> > request 
> > > a PE - maybe they forgot it by accident.
> > > 
> > > 
> > > Quoting "Raphael Y. Rubin" <rafi at cs.caltech.edu>:
> > > 
> > > > I would like to configure my grid so that mortal users can grab
> > exclusive
> > > > access to a machine, using the normal submittion commands with little
> > extra
> > > > work.
> > > > 
> > > > Occassionally, a user wants to run a job exclusively for benchmarking,
> > > > whether that's to test a class of machine or a specific job.
> > > > 
> > > > Also some jobs we know will be resource hogs, and we'd like to annotate
> > them
> > > > to indicate they are the equivalent of two or more normal jobs.
> > > > 
> > > > And of course there are various other nees that arrise, but the above two
> > are
> > > > the most common and important.  In the past we just use to specify "-l
> > > > slots=n".  But as of sge5.3 that was discouraged.
> > > > 
> > > > 	error: denied: use parallel environments instead of requesting slots
> > > > explicitly
> > > > 	
> > > > 		- from sge6
> > > > 
> > > > In sge 5.3, I had created a slots pe, after we first noticed the messages
> > 
> > > > about -l slots.  Here is an updated version of that pe (in a form for sge
> > 
> > > > 6).
> > > > 
> > > > pe_name           slots
> > > > slots             999
> > > > user_lists        NONE
> > > > xuser_lists       NONE
> > > > start_proc_args   /bin/true
> > > > stop_proc_args    /bin/true
> > > > allocation_rule   $pe_slots
> > > > control_slaves    FALSE
> > > > job_is_first_task TRUE
> > > > urgency_slots     min
> > > > 
> > > > Alternatively, one can use a consumable complex, as described  in:
> > > > 
> > >
> > 
> http://gridengine.sunsource.net/servlets/BrowseList?list=users&by=thread&from=2
> > > 530
> > > > 
> > > > or more simply:
> > > > #name               shortcut   type        relop requestable consumable
> > 
> > > > default  urgency
> > > > 
> > >
> > 
> #------------------------------------------------------------------------------
> > > ----------
> > > > vslots              vs         INT         <=    YES         YES       
> > 1
> > > > 1000
> > > > 
> > > > Which is of course just the normal complex slots copied to a different 
> > > > name to get around the explicit "-l slots" block.  Somehow this seems 
> > > > wrong, a reincarnation of a technique deliberately killed for some reason
> > 
> > > > unknown to me.
> > > > 
> > > > 
> > > > 
> > > > Which style is prefered and why?
> > > > What are the ramifications?
> > > > Are there any behavioral differences?
> > > > 
> > > > 
> > > > As for the prefered option, any suggestions to improve the above
> > > > configurations?
> > > > Also what's the best way to deploy, either globally, or to a set of
> > queues?
> > > > 
> > > > 
> > > > I know with sge 5.3, I was able to use:
> > > > queue_list        all
> > > > To deploy to my whole cell.
> > > > 
> > > > 
> > > > 
> > > > Also on a not complete tangent.  Does anyone have advice, or has anyone
> > 
> > > > written guidelines to optimize configuration of queues?  We are mostly 
> > > > using dual cpu xeons with hyperthreading and 4G of ram.
> > > > 
> > > > Our jobs are mostly java, c, and lisp, single threaded (except the jvm 
> > > > which forks its own stuff).  Jobs mostly run in a few hundred MB or less,
> > 
> > > > with a occasional memory hog which will eat a gig or two.
> > > > 
> > > > 
> > > > Rafi Rubin
> > > > California Institute of Technology
> > > > 
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > > > 
> > > 
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list