[GE users] selection criteria for automatic job suspension
bernard.massot at ens.fr
Thu Jan 14 12:32:19 GMT 2010
[ The following text is in the "utf-8" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some characters may be displayed incorrectly. ]
On Thu, Jan 14, 2010 at 12:54:20PM +0100, reuti wrote:
> Am 11.01.2010 um 16:57 schrieb massot:
> > According to what I understand after reading source code (man page
> > isn't fully informative), when suspend_thresholds is reached on a
> > host, the job selected for suspension is the one running for the
> > shortest time, and we can't do it another way.
> > Did I get that right? If so, I think it would be nice to have the
> > option to tell scheduler to select job that has highest load instead
> > of shortest run time.
> there was a similar discussion about the slot-wise suspend on
> subordination - which job to suspend? The best would be of course to
> let the user decide whether it should be the one with the shortest or
> the longest runtime (and maybe to correct the h_rt).
In my case I had a computer with two jobs run by two different persons
not working together. So it wouldn't always be relevent to think that
"user should decide".
> What do you mean by "highest load"? Every running process which is
> eligible to be executed generates a load of 1. Do you mean parallel
> jobs on a node?
I had a computer with a load of more than 10 whereas there were only 4
cores and 2 non-parallel jobs. For some reason (the user told me it
could have been because of huge memory usage) one of these was consuming
way too much resources. After I killed it, system load lowered until
> > What often happens, I guess, is that suspend_thresholds is reached
> > only when a job "goes mad" so that would make more sense to suspend
> > this one rather than another one running normally for a longer
> > time.
> Did you define more slots than installed cores? Nowadays the load is
> a little bit misleading, as also uninterruptible kernel tasks will
> increase the load, although they are waiting for the disk or alike
> (state "D"). Maybe the feature of suspend_threshold isn't suited for
> modern Linux systems at all.
There were only 4 slots for 4 cores.
Well, I experienced a case where load was absolutely relevent.
Bernard Massot - Bureau D4 - Département de physique
École Normale Supérieure
24 rue Lhomond - 75005 Paris
Tél: +33 1 44 32 25 89
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users