[GE users] SGE Loading
sli1que at yahoo.com
Thu May 27 11:35:31 BST 2010
Thanks for the help. I was able to figure it out. The jobs were not parallel environment jobs. The previous admin was using sequence numbers based queue sorting.
After looking at the config I could see that 3 machines had lower seq number therefore all jobs were being dispatched among those three. After seeing that I understood what was
happening. I then adjusted the seq numbers so that one number included all of our 64 bit machines and the 32 bit ones had a higher seq number. This way all of the 64 bit machines would be used
first then once all full, the 32 bit machines would run.
From: craffi <dag at sonsorol.org>
To: users at gridengine.sunsource.net
Date: 05/27/2010 03:13 AM
Subject: Re: [GE users] SGE Loading
Behavior as you describe is unusual enough to imply that the previous
admin has made some configuration changes ( I think ).
By default the SGE scheduler will send your job to the "least busy"
system among the entire set of nodes that are able to satisfy all of the
job requirements. On clusters with identical hardware you would
generally see the job dispersal/scattering that you are seeking.
There are a few things that could be happening:
- You have nodes that are in unusual states and thus are not really able
to accept work
- someone has altered the load_adjust_threshold to be too low
- someone has swiched from load based queue instance sorting to "seqno"
If you posted the output of these commands the list might be able to
make concrete suggestions:
(1) show the current state of your cluster via "qstat -f"
(2) dump the qmaster config via "qconf -sconf"
(3) show the scheduler config via "qconf -ssconf"
OOPS - forgot to ask a real question. Are these parallel jobs? If they
are parallel jobs and someone has configured the PE with an allocation
rule of $fill_up than you would see the job "packing" that you describe.
With parallel jobs you have a bit more control within the PE object as
to how jobs are scattered across machines.
> I have a SGE setup that I have taken over administration. Currently, I don't see jobs being evenly disbursted among a specific cluster. ie cluster A has 10 identical computers and if I submit 10 jobs, I would expect each computer to get assigned 1 job. I actually get about 6 or 7 on one machine and the other will go to other machines. Can anyone help me to understand what I need to do or where I need to look?
> in Advance.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users