[GE users] selection criteria for automatic job suspension
reuti at staff.uni-marburg.de
Thu Jan 14 12:56:17 GMT 2010
[ The following text is in the "utf-8" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some characters may be displayed incorrectly. ]
Am 14.01.2010 um 13:32 schrieb massot:
> On Thu, Jan 14, 2010 at 12:54:20PM +0100, reuti wrote:
>> Am 11.01.2010 um 16:57 schrieb massot:
>>> According to what I understand after reading source code (man page
>>> isn't fully informative), when suspend_thresholds is reached on a
>>> host, the job selected for suspension is the one running for the
>>> shortest time, and we can't do it another way.
>>> Did I get that right? If so, I think it would be nice to have the
>>> option to tell scheduler to select job that has highest load instead
>>> of shortest run time.
>> there was a similar discussion about the slot-wise suspend on
>> subordination - which job to suspend? The best would be of course to
>> let the user decide whether it should be the one with the shortest or
>> the longest runtime (and maybe to correct the h_rt).
> In my case I had a computer with two jobs run by two different persons
> not working together. So it wouldn't always be relevent to think that
> "user should decide".
By user I meant us - the admins.
>> What do you mean by "highest load"? Every running process which is
>> eligible to be executed generates a load of 1. Do you mean parallel
>> jobs on a node?
> I had a computer with a load of more than 10 whereas there were only 4
> cores and 2 non-parallel jobs. For some reason (the user told me it
> could have been because of huge memory usage) one of these was
> way too much resources. After I killed it, system load lowered until
> about 1.
Yes, when it starts to swap you will get a load which reflects the
paging processes. You could set up a hard memory limit which in total
sums up to the physically installed memory:
>>> What often happens, I guess, is that suspend_thresholds is reached
>>> only when a job "goes mad" so that would make more sense to suspend
>>> this one rather than another one running normally for a longer
>> Did you define more slots than installed cores? Nowadays the load is
>> a little bit misleading, as also uninterruptible kernel tasks will
>> increase the load, although they are waiting for the disk or alike
>> (state "D"). Maybe the feature of suspend_threshold isn't suited for
>> modern Linux systems at all.
> There were only 4 slots for 4 cores.
> Well, I experienced a case where load was absolutely relevent.
> Bernard Massot - Bureau D4 - Département de physique
> École Normale Supérieure
> 24 rue Lhomond - 75005 Paris
> Tél: +33 1 44 32 25 89
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users