[GE users] Qsub strange behaviours

reuti reuti at staff.uni-marburg.de
Thu Jul 29 10:45:21 BST 2010


Am 29.07.2010 um 09:50 schrieb spow_:

> Sorry for the late answer, I had other matters to deal with yesterday.
> > >> <snip>
> > > However, if a job is dispatched on node 2, it keeps running. So it looks like subordination only happens on one node (I can't test with more nodes, my test server only has 2 of them)
> > >>> - Eventually, I ran parallel jobs, with $round_robin allocation. If I submit a limited number of jobs, they get correctly dispatched.
> > >>> But if a few jobs are already running in the parallel queues,
> > >>>
> > >>
> > >> Why do you have many parallel queue? The idea behind SGE is to specify resource requests, and SGE will select an appropriate queue for your. It's not like Torque, where you submit into a queue.
> > >>
> > > Well, I have much parallel queues because parallel jobs are unequal in size and number.
> > > Therefore, a parallel queue spans from node 1 to x. This interval is further subdivided in parallel queues (e.g. 1*12, 2*6, 4*3 ...)
> >
> > You mean PEs with fixed allocation rules?
> I'm not 100% sure what fixed allocation rule is, my best guess is when you give an integer in the allocation rule of the PE representing the max number of allocated slots, in which case I don't use fixed AR. I use $round_robin (more on that below)

yes, this I thought about, as I had no clue what you meant by "subdivided in parallel queues".

> What I meant was that I have a big parallel queue spanning across all hosts, but used rarely.
> 2 parallel queues represent half of the big queue, used more often.
> Eventually, 4 other queues represent 1/4th of the big queue.
> It 'looks like' this (where '=' is a node) :
> ==============  P1
> =======  ======  P2 & P3
> === ===  === ===   P4 & P5 & P6 & P7

Why do you have so many queues? You could just stay even with one parallel queue.

> There also are sequential (batch) subordinated queues running on these nodes, symetrical to those above.
> Users will mostly use P4-7, but queues P1-P3 can be used in case there are bigger jobs needing faster compute time.
> My allocation rule is $round_robin, I think it's the best to use here because users will only use MPI for a while (until they change their code to allow OpenMP integration, and someone else will tweak what I did here to allow users to use OpenMP + MPI at the same time), and I base this assumption on this table :
> http://www.hpccommunity.org/f55/multi-core-strategies-mpi-openmp-702/
> They state that MPI runs much better when dispatched on many nodes, rather than on the same node, which actually is OpenMP's job.

I read it more in they way, that it depends on the kind of application. One difference is, that all OpenMP threads share the same memory, while MPI processes use their own. Then it's a matter of communication: does you start the MPI tasks and collect just the results after hours of (local) computing, or is there heavy communication involved (where you also have to start to think about using InfiniBand instead of Ethernet)? This is also stated on the page you mentioned two paragraphs before the table: "The second assumption is that MPI programs must be spread across multiple nodes in order to run effectively. As Table One demonstrates, neither of the assumptions hold true 100% of the time. " 1)

> So $round_robin seems to be the way to go.
> My actual problem with $round_robin is that if there are several parallel jobs running on the 2 hosts (my test farm only has 2 nodes), the latter one (and any follower) will get dispatched on only one host, whereas it could potentially get dispatched on 2 hosts in terms of free slots (because job N goes to node 1 and job N+1 goes to node 2)
> I tried to change the scheduler configuration to not use load_avg to dispatch jobs, but it still has the same behaviour.

np_load_avg is the default. Is there already something running on these nodes - do you request any resource like memory?

> Do I have to use complex(es) to make sure an MPI-parallel job always gets dispatched on 2+ hosts ?

-- Reuti

1) http://www.hpccommunity.org/f55/multi-core-strategies-mpi-openmp-702/

> Thanks,
> GQ


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list