[GE users] Qsub strange behaviours
miomax_ at hotmail.com
Tue Jul 27 13:13:49 BST 2010
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
reuti a écrit :
the syntax needs to be revised:
qsub [ options ] [ command [ command_args ]]
i.e. options come fist, then the command, the options to the command/script last.
Yep, my bad.
Which doesn't send the job where it is supposed to. It picks a queue randomly.
While reading the manpages, I found the -hard option, but it doesn't work either.
- Also, I use subordinates. The problem is that only part of the queue get suspended ! (e.g. my queue sub1 runs across 2 hosts, and only the sub1 at host1 gets suspended, whereas it is very clear the whole queue should be suspended).
How is the subordination defined in the queue setup (`qconf -sq ...`)?
H2 is a parallel queue to whom L2 is a subordinate. I truncated parts (none/infinity) of the following configuration :
qconf -sq H2
pe_list make mpi
I have the exact same H1 and L1 queues (clones) and they have the same problem.
The L queues span 2 nodes. If a job is running on node 1 and L gets suspended, node 1 is suspended. However, if a job is dispatched on node 2, it keeps running. So it looks like subordination only happens on one node (I can't test with more nodes, my test server only has 2 of them)
- Eventually, I ran parallel jobs, with $round_robin allocation. If I submit a limited number of jobs, they get correctly dispatched.
But if a few jobs are already running in the parallel queues,
Why do you have many parallel queue? The idea behind SGE is to specify resource requests, and SGE will select an appropriate queue for your. It's not like Torque, where you submit into a queue.
Well, I have much parallel queues because parallel jobs are unequal in size and number.
Therefore, a parallel queue spans from node 1 to x. This interval is further subdivided in parallel queues (e.g. 1*12, 2*6, 4*3 ...), with sequence numbers decreasing from node 1.
More information about the gridengine-users