[GE users] default "-l" attributes for each queue

reuti reuti at staff.uni-marburg.de
Wed Sep 16 17:15:51 BST 2009


Am 16.09.2009 um 13:07 schrieb txema_heredia:

> Hi Reuti,
>
> thanks for your answer.
>
>> can you elaborate this in more detail?
>
>
> I'll do it.
>
> My users jobs can be lumped into 3 groups: jobs that last less than  
> 1 hour, jobs that last less than 1 day, and jobs that last "a lot".
>
> I have 8 computing nodes (8-core each), and currently all types of  
> jobs are running on all nodes, causing some "fairness" problems,  
> even though we are using a user-ticket policy (due to the job  
> duration differences).
>
> So I've decided to split my cluster in 3 blocks (and I made a queue  
> for each one of them), each one of them dedicated to a specific  
> kind of job. This way, even though there are lots of "slow" jobs  
> running, it won't affect the "fast" ones (Users tend to launch from  
> 100 to 1.000 jobs at the same time, but they tend to be of only one  
> kind. This way there shouldn't be disturbances among projects). But  
> this causes a problem: if there are no jobs of a given type, those  
> nodes are not working, even if there are thousands of jobs waiting  
> in the other queues.
>
>
> So, I decided to create a system where my jobs will run "primarily"  
> in their specific block of nodes. If that block has all its slots  
> filled, it will try to submit jobs to the other nodes if they  
> aren't busy, but only filling up to 6 of the 8 possible slots. This  
> way the system keeps at least 2 slots for the preferred kind of jobs 
> [ to run in their "priority" hosts] (and once those 2 slots are  
> filled, the queue's load_threshold prevents other "non-priority"  
> jobs to be scheduled).
>
>
> In order to do that I created several consumables which "count" how  
> many jobs of each kind I have running in that host:
>
>
> num_jobs = number of total jobs running in the host (I want up to 8  
> jobs running in any host)
>
> fast_jobs = number of "fast" jobs running in the host ( 8 in the  
> "fast-priority" hosts) used for load threshold to stop queues when  
> this is 2 or more.
>
> med_jobs = number of "medium" jobs running in the host ( 8 in the  
> "medium-priority" hosts) used for load threshold to stop queues  
> when this is 2 or more.
>
> slow_jobs = number of "slow" jobs running in the host ( 8 in the  
> "slow-priority" hosts) used for load threshold to stop queues when  
> this is 2 or more.
>
> fast_med = the sum of "fast" and "medium" (not "slow") jobs running  
> on a host. Used for load threshold to stop queues when this is 6 or  
> more in the "slow-priority" hosts.
>
> fast_slow = the sum of "fast" and "slow" (not "medium") jobs  
> running on a host. Used for load threshold to stop queues when this  
> is 6 or more in the "medium-priority" hosts.
>
> med_slow = the sum of "medium" and "slow"  (not "fast") jobs  
> running on a host. Used for load threshold to stop queues when this  
> is 6 or more in the "fast-priority" hosts.
>
>
> So, in order to run my jobs I have to type this:
>
>
> fast jobs --> qsub -q fast -l num_jobs=1 -l fast_jobs=1 -l  
> fast_med=1 -l fast_slow=1 ...
>
> medium jobs --> qsub -q med -l num_jobs=1 -l med_jobs=1 -l  
> fast_med=1 -l med_slow=1 ...
>
> slow jobs --> qsub -q slow -l num_jobs=1 -l slow_jobs=1 -l  
> fast_slow=1 -l med_slow=1 ...
>
>
> If I type this manually it works, and everything is OK, but those  
> params are mandatory for the system to work, and if you obviate  
> one, it could lead to blocking other people's jobs, ...plus I do  
> NOT trust my users (they are all biologists). The less they have to  
> do, the better for the system stability
>
>
>
>> The idea behind SGE is to select a queue for you according to the
>> given resource requests (in contrast to other queuing systems where
>> you submit into a queue).
>>
>> The resource requests will be used by SGE to schedule it to a queue
>> which fulfills the request - hence they must be known. Requesting a
>> queue and resources at the same time may be redundant. Do you want
>> some values in the jobscript having certain values, which the scripts
>> should use?
>>
>> -- Reuti
>>
>> PS: Nevertheless, when I get you right: give all queues all
>> parameters, and the not appropriate one set to a high value (assuming
>> they are consumable), e.g.:
>>
>> q1: param1=3,param2=9,param3=27,param4=9999,param5=9999,param6=9999
>> q2: param1=9999,param2=9999,param3=9999,param4=16,param5=8,param6=4
>>
>> qsub -l param1=3,param2=9,param3=27,param4=16,param5=8,param6=4
>> should run in both queues.
>>
>
> Yes, this would work, but the problem is the same as before, It's  
> too complicated for trusting my users to use it properly

I will answer in detail later. But in this you could put in a JSV  
(job submission verifier) in 6.2u3.

-- Reuti

>
>>
>>> I suppose that there won't be any solution, but I have to try ;)
>>>
>>>
>>> PS: I'm using 6.1u4
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=217346
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=217471
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=217506

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list