[GE users] default "-l" attributes for each queue

reuti reuti at staff.uni-marburg.de
Thu Sep 17 12:03:20 BST 2009


Hi,

Am 16.09.2009 um 13:07 schrieb txema_heredia:

> Hi Reuti,
>
> thanks for your answer.
>
>> can you elaborate this in more detail?
>
>
> I'll do it.
>
> My users jobs can be lumped into 3 groups: jobs that last less than  
> 1 hour, jobs that last less than 1 day, and jobs that last "a lot".
>
> I have 8 computing nodes (8-core each), and currently all types of  
> jobs are running on all nodes, causing some "fairness" problems,  
> even though we are using a user-ticket policy (due to the job  
> duration differences).
>
> So I've decided to split my cluster in 3 blocks (and I made a queue  
> for each one of them), each one of them dedicated to a specific  
> kind of job. This way, even though there are lots of "slow" jobs  
> running, it won't affect the "fast" ones (Users tend to launch from  
> 100 to 1.000 jobs at the same time, but they tend to be of only one  
> kind. This way there shouldn't be disturbances among projects). But  
> this causes a problem: if there are no jobs of a given type, those  
> nodes are not working, even if there are thousands of jobs waiting  
> in the other queues.
>
>
> So, I decided to create a system where my jobs will run "primarily"  
> in their specific block of nodes. If that block has all its slots  
> filled, it will try to submit jobs to the other nodes if they  
> aren't busy, but only filling up to 6 of the 8 possible slots. This  
> way the system keeps at least 2 slots for the preferred kind of jobs 
> [ to run in their "priority" hosts] (and once those 2 slots are  
> filled, the queue's load_threshold prevents other "non-priority"  
> jobs to be scheduled).
>
>
> In order to do that I created several consumables which "count" how  
> many jobs of each kind I have running in that host:
>
>
> num_jobs = number of total jobs running in the host (I want up to 8  
> jobs running in any host)
>
> fast_jobs = number of "fast" jobs running in the host ( 8 in the  
> "fast-priority" hosts) used for load threshold to stop queues when  
> this is 2 or more.
>
> med_jobs = number of "medium" jobs running in the host ( 8 in the  
> "medium-priority" hosts) used for load threshold to stop queues  
> when this is 2 or more.
>
> slow_jobs = number of "slow" jobs running in the host ( 8 in the  
> "slow-priority" hosts) used for load threshold to stop queues when  
> this is 2 or more.
>
> fast_med = the sum of "fast" and "medium" (not "slow") jobs running  
> on a host. Used for load threshold to stop queues when this is 6 or  
> more in the "slow-priority" hosts.
>
> fast_slow = the sum of "fast" and "slow" (not "medium") jobs  
> running on a host. Used for load threshold to stop queues when this  
> is 6 or more in the "medium-priority" hosts.
>
> med_slow = the sum of "medium" and "slow"  (not "fast") jobs  
> running on a host. Used for load threshold to stop queues when this  
> is 6 or more in the "fast-priority" hosts.

Mmh - when I get you right here, it should be possible to request  
just "slow", "medium" or "fast", which are attached to each host with  
arbitrary high number of 9999 or alike.

Then you will need an RQS (resource quota set), where you can define  
the individual limts of "slow",... for each queue. But I think it's  
also possible in your setup to stay with one queue, and limit the  
consumption of for each hostgroup:

{
   ...
   ...
   limit hosts {@block1} to slow=10
   limit hosts {@block2} to slow=30
   limit hosts {@block3} to slow=20
}

and similar for medium and fast.

-- Reuti


> So, in order to run my jobs I have to type this:
>
>
> fast jobs --> qsub -q fast -l num_jobs=1 -l fast_jobs=1 -l  
> fast_med=1 -l fast_slow=1 ...
>
> medium jobs --> qsub -q med -l num_jobs=1 -l med_jobs=1 -l  
> fast_med=1 -l med_slow=1 ...
>
> slow jobs --> qsub -q slow -l num_jobs=1 -l slow_jobs=1 -l  
> fast_slow=1 -l med_slow=1 ...
>
>
> If I type this manually it works, and everything is OK, but those  
> params are mandatory for the system to work, and if you obviate  
> one, it could lead to blocking other people's jobs, ...plus I do  
> NOT trust my users (they are all biologists). The less they have to  
> do, the better for the system stability
>
>
>
>> The idea behind SGE is to select a queue for you according to the
>> given resource requests (in contrast to other queuing systems where
>> you submit into a queue).
>>
>> The resource requests will be used by SGE to schedule it to a queue
>> which fulfills the request - hence they must be known. Requesting a
>> queue and resources at the same time may be redundant. Do you want
>> some values in the jobscript having certain values, which the scripts
>> should use?
>>
>> -- Reuti
>>
>> PS: Nevertheless, when I get you right: give all queues all
>> parameters, and the not appropriate one set to a high value (assuming
>> they are consumable), e.g.:
>>
>> q1: param1=3,param2=9,param3=27,param4=9999,param5=9999,param6=9999
>> q2: param1=9999,param2=9999,param3=9999,param4=16,param5=8,param6=4
>>
>> qsub -l param1=3,param2=9,param3=27,param4=16,param5=8,param6=4
>> should run in both queues.
>>
>
> Yes, this would work, but the problem is the same as before, It's  
> too complicated for trusting my users to use it properly
>
>>
>>> I suppose that there won't be any solution, but I have to try ;)
>>>
>>>
>>> PS: I'm using 6.1u4
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=217346
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=217471
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=217621

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list