[GE users] contemporary of -l slots=#

Charu Chaubal Charu.Chaubal at Sun.COM
Wed Mar 30 22:16:21 BST 2005


Hello Raphael,

Here is another email which describes different approaches to what you
are trying to do:

http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgId=3643

Raphael Y. Rubin wrote:
> Ignoring the agrument of what's proper, is there an easy way to assign a
> PE to every queue, or a list of queues?  And is there an easy way to
> specify a complex value to each machine or queue?
> 

In SGE6, you assign PEs to queues individually, eg,

bash-2.05# qconf -sq all.q | grep pe_list
pe_list               make

Simply have all the attached PEs listed there.  You need to do this on a
per-cluster-queue basis.

Same thing with complex values:

bash-2.05# qconf -sq all.q | grep complex_values
complex_values        NONE

bash-2.05# qconf -se dt218-62 | grep complex_values
complex_values        NONE

(To modify, just use the corresponding 'qconf -mX <name>' command, where
"X" = "q", "h" for queue or host)

Regards,
	Charu


> On Wed, Mar 30, 2005 at 10:30:13PM +0200, Reuti wrote:
> 
>>Quoting "Raphael Y. Rubin" <rafi at cs.caltech.edu>:
>>
>>
>>>I guess I'm a little unclear as to what the "logic behind" is, I
>>>wouldn't mind a short explanation.
>>
>>To me means "one slot = one job (task) to run".
>>
>>
>>>So are you saying the PE warning basically assumes the user intended to
>>>use a PE and simply forgot?  And creating a PE to specify the number of
>>
>>Yes, requesting more slots is for parallel jobs. If I got you correctly, you 
>>want to run a serial job, but exclusive on one machine with no other job on it.
>>
>>
>>>slots to reserve is bad, at least if there's no other driving purpose?
>>
>>I said: personal taste - it's not bad per se, but I wouldn't like it. You will 
>>request a PE for a serial job.
> 
> 
> I don't like it either.  But it does seem the easiest way to give a
> serial job a full machine of a given class.
> 
> As I said, sometimes our users just want to see what sort of performance
> they can get.  This is not our typical usage, but it comes up often
> enough.  And we don't want to have to pull a machine from the grid for
> this sort of thing.
> 
> Using memory or other constraints for this also seems like a bit of a
> lie.  And also has the down side of lack of safety.  A user can specify
> a job with -l vf=0, and it will freely run, more over it will even
> target a machine with a job using vf=max because the scheduler will see
> with n-1 free slots.
> 
> 
>>>I'm not so sure "virtual_free" is a safe alternative.  It seems to vary
>>>too wildly, even on unloaded systems.
>>
>>When you set it for each exec host (qconf -me <host>), it will be exactly this 
>>value. And if you request exactly this value you get it.
> 
> 
> I don't know for sure, but doesn't virtual_free include actual system
> utilization and reserved resources?  It seems that overriding the
> utilization reduces the uility of this variable in general.
> 
> 
>>>Furthermore it doesn't seem to garuntee grid exclusivity (for the host),
>>>which is what
>>>our users occassionally need.
>>
>>What do you mean grid exclusivity? One job in the whole cluster?
> 
> 
> One job gets a machine completely to itself (as far as the grid is
> concerned).
> 
>>CU - Reuti
>>
>>
>>>
>>>I guess this gets back to the question of intent?  Is the slots thing
>>>specifically discouraged because there is an assumption that users
>>>shouldn't have the right to monopolize a machine?  Or is there something
>>>else going on?
>>>
>>>Rafi
>>>
>>>On Wed, Mar 30, 2005 at 09:53:37PM +0200, Reuti wrote:
>>>
>>>>Hi Raphael,
>>>>
>>>>it's personal taste, but I wouldn't use any of the two options you offered
>>>
>>>- 
>>>
>>>>both are not refelecting the logic behind. Although: what I suggest is
>>>
>>>similar 
>>>
>>>>to the second:
>>>>
>>>>- make the complex "virtual_free" consumable and requestable - default
>>>
>>>1GB
>>>
>>>>                                                       (or what you like)
>>>>
>>>>- attach this to each node with a value of the built in memory
>>>>                                    (complex_values   virtual_free=3.5GB)
>>>>
>>>>- request 3.5GB in your qsub command -> single job on the node
>>>>
>>>>
>>>>With 4GB built in only 3.5GB are usable I think. Yes, it's nearly the same
>>>
>>>as 
>>>
>>>>your vslots, but this can also be used for a real request of just 2GB for
>>>
>>>>larger jobs.
>>>>
>>>>Cheers - Reuti
>>>>
>>>>PS: IMO it's good to disallow the request for slots, to remind users to
>>>
>>>request 
>>>
>>>>a PE - maybe they forgot it by accident.
>>>>
>>>>
>>>>Quoting "Raphael Y. Rubin" <rafi at cs.caltech.edu>:
>>>>
>>>>
>>>>>I would like to configure my grid so that mortal users can grab
>>>
>>>exclusive
>>>
>>>>>access to a machine, using the normal submittion commands with little
>>>
>>>extra
>>>
>>>>>work.
>>>>>
>>>>>Occassionally, a user wants to run a job exclusively for benchmarking,
>>>>>whether that's to test a class of machine or a specific job.
>>>>>
>>>>>Also some jobs we know will be resource hogs, and we'd like to annotate
>>>
>>>them
>>>
>>>>>to indicate they are the equivalent of two or more normal jobs.
>>>>>
>>>>>And of course there are various other nees that arrise, but the above two
>>>
>>>are
>>>
>>>>>the most common and important.  In the past we just use to specify "-l
>>>>>slots=n".  But as of sge5.3 that was discouraged.
>>>>>
>>>>>	error: denied: use parallel environments instead of requesting slots
>>>>>explicitly
>>>>>	
>>>>>		- from sge6
>>>>>
>>>>>In sge 5.3, I had created a slots pe, after we first noticed the messages
>>>
>>>>>about -l slots.  Here is an updated version of that pe (in a form for sge
>>>
>>>>>6).
>>>>>
>>>>>pe_name           slots
>>>>>slots             999
>>>>>user_lists        NONE
>>>>>xuser_lists       NONE
>>>>>start_proc_args   /bin/true
>>>>>stop_proc_args    /bin/true
>>>>>allocation_rule   $pe_slots
>>>>>control_slaves    FALSE
>>>>>job_is_first_task TRUE
>>>>>urgency_slots     min
>>>>>
>>>>>Alternatively, one can use a consumable complex, as described  in:
>>>>>
>>>>
>>http://gridengine.sunsource.net/servlets/BrowseList?list=users&by=thread&from=2
>>
>>>>530
>>>>
>>>>>or more simply:
>>>>>#name               shortcut   type        relop requestable consumable
>>>
>>>>>default  urgency
>>>>>
>>>>
>>#------------------------------------------------------------------------------
>>
>>>>----------
>>>>
>>>>>vslots              vs         INT         <=    YES         YES       
>>>
>>>1
>>>
>>>>>1000
>>>>>
>>>>>Which is of course just the normal complex slots copied to a different 
>>>>>name to get around the explicit "-l slots" block.  Somehow this seems 
>>>>>wrong, a reincarnation of a technique deliberately killed for some reason
>>>
>>>>>unknown to me.
>>>>>
>>>>>
>>>>>
>>>>>Which style is prefered and why?
>>>>>What are the ramifications?
>>>>>Are there any behavioral differences?
>>>>>
>>>>>
>>>>>As for the prefered option, any suggestions to improve the above
>>>>>configurations?
>>>>>Also what's the best way to deploy, either globally, or to a set of
>>>
>>>queues?
>>>
>>>>>
>>>>>I know with sge 5.3, I was able to use:
>>>>>queue_list        all
>>>>>To deploy to my whole cell.
>>>>>
>>>>>
>>>>>
>>>>>Also on a not complete tangent.  Does anyone have advice, or has anyone
>>>
>>>>>written guidelines to optimize configuration of queues?  We are mostly 
>>>>>using dual cpu xeons with hyperthreading and 4G of ram.
>>>>>
>>>>>Our jobs are mostly java, c, and lisp, single threaded (except the jvm 
>>>>>which forks its own stuff).  Jobs mostly run in a few hundred MB or less,
>>>
>>>>>with a occasional memory hog which will eat a gig or two.
>>>>>
>>>>>
>>>>>Rafi Rubin
>>>>>California Institute of Technology
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
####################################################################
# Charu V. Chaubal              # Phone: (650) 786-7672 (x87672)   #
# Grid Computing Technologist   # Fax:   (650) 786-4591            #
# Sun Microsystems, Inc.        # Email: charu.chaubal at sun.com     #
####################################################################


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list