[GE users] SGE Configuration

reuti reuti at staff.uni-marburg.de
Wed Feb 18 11:22:02 GMT 2009

    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]


Am 18.02.2009 um 11:19 schrieb ms:

> I've got several questions about transfering our business rules to the
> configuration of the gridengine. I hope you can help me a little bit.
> Infrastructure:
> 12 Nodes with 8 Cores  (2 Quad-Xeon)
> 4 Nodes with 16 GB, 8 Nodes with 4 GB
> Business rules:
> a)
> Because the gridengine will be used by only a few people, they can  
> manage the
> overall queuing offline. So the best way will be the default way,  
> that every
> user enqueue their jobs to the default queue all.q.
> Gridengine starts these jobs in FIFO-order.
> But sometimes, users want to test their jobs, so jobs have to start  
> immediatly,
> regardless of other jobs current running.
> I think, the best way to implement this rule is to set up a fast.q  
> with
> subordinate all.q, isn't it?
> But how can I configure Gridengine, to suspend only one job (from  
> all.q) in pair
> with starting a job from fast.q. Sometimes he starts 1 fast.q job  
> and suspends
> all other jobs of the node, which is obviously not necessary.

this is how subordination works. You can specify the number of used  
slots (in the superordinated queue) when the subordination starts,  
but you can't suspend only one job.

- Instead of subordination you could use a suspend_threshold in the  
to be subordinated queue. When the load is too high, it will suspend  
jobs one after the other.

- Use an urgency policy. This will bypass the usual FIFO, and you  
just have to request a boolen complex in your job submission -  hence  
no extra queue: http://www.sun.com/blueprints/1005/819-4325.pdf page 8.

> b)
> We want a maximum utilization of our nodes, so generally each node  
> should be
> filled up with 8 jobs; most of our jobs are non-parallel. Users  
> should be able
> to append memory-using information to their jobs. But this must be  
> optional,
> because sometime users don?t know this information.

Over time they will get used to it, as it's reported for each job in  
"qacct -j <job_id>" after the job.

> I think, that the mem_free value represents this rule.
> But, if I start an array-job with the mem_free=4G value, the  
> gridengine ignores
> the amount of memory and fills up the nodes with 8 jobs per node in  
> the first
> scheduling-run. I think that it only compares the current free  
> memory with the
> amount of memory the user has given to the array job, but don?t  
> decrement the
> value with each start.
> How can I handle this problem?

The usual way is to make either:

- h_vmem or virtual_free consumable and attach a default request in  
the complex configuration

- attach a value of the built-in physical RAM in "qconf -me  
<exechost>" or in an RQS to limit the total consumption on a node,  
i.e. per node

- request per job the correct value when it's higher than the default

- optional for h_vmem: limit in the queue configuration the upper  
limit which a single job might request.

Difference between h_vmem and virtual_free is, that h_vmem is  
enforced and the job killed when it consumes more, while virtual_free  
is only a guidance for SGE when the users behave fair.

You can check these consumptions with: qhost -F

> c)
> Sometimes jobs need a huge amount of memory, but only at the start.  
> The
> operations are only at a small memory bandwidth, so the OS can swap  
> the main
> part. So we want to be able to say ?start only X jobs at one node?.  
> Does the
> gridengine support that kind of scheduling, and how can I implement  
> that?

a) to start only x jobs on a node, you can limit the slots in the  
queue definition or in an RQS (resource quoat set) for certain queues.

But I think you mean it in a different way, like: start only one job  
and after 5 minutes the next one.

b) this can be done by job_load_adjustmentsin the scheduler  
configuration (man sched_conf). This will put an artificial load (of  
any complex of your choice) on the machine, which will vanish over  

> d)
> We?ve got 2 resources where we can acquire CPLEX license.  A script  
> for a load
> sensor, that reports the current amount of free license, exists.  
> The gridengine
> runs the script on a node and reports the current value properly.
> But here the same problem with the free_mem value occurs. If any  
> CLPEX licence
> is available, he will fill up the nodes and doesn?t memorizes, that  
> he has
> already start jobs.

Why do you use a load sensor and not just a consumable complex, which  
will prevent this behavior? Is it FlexLM based?

-- Reuti


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list