IZ3276: qrsh -inherit should allow -q to select a queue out of the granted ones
|Reported by:||reuti||Owned by:|
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3276]
Issue #: 3276 Platform: All Reporter: reuti (reuti) Component: gridengine OS: All Subcomponent: clients Version: 6.2u5 CC: None defined Status: NEW Priority: P3 Resolution: Issue type: ENHANCEMENT Target milestone: --- Assigned to: roland (roland) QA Contact: roland URL: * Summary: qrsh -inherit should allow -q to select a queue out of the granted ones Status whiteboard: Attachments: Issue 3276 blocks: Votes for issue 3276: Opened: Tue Aug 10 04:07:00 -0700 2010 ------------------------ Although it is often desired to get slots from only one queue for a parallel job, it's valid to attach the same PE to different queues and get slots from a mixture of queues. When now the job gets slots from a mixture of slots, there is no means by the application to direct `qrsh -inherit ...` to the correct queue. SGE will select any on its own of the granted ones. When the parallel application now makes e.g. 2 times `qrsh -inherit ...` calls to the same machine, to fork in each of both the granted slots e.g. 2 processes to get 4 in total, all processes may end up in the same queue with the same set $TMPDIR. $ qsub -pe openmpi 5 -masterq all.q@pc15370 -q "*@pc15370" ./mymy.sh Your job 1900 ("mymy.sh") has been submitted $ cat mymy.sh.o1900 pc15370 1 all.q@pc15370 UNDEFINED pc15370 2 extra.q@pc15370 UNDEFINED pc15370 2 extra1.q@pc15370 UNDEFINED TMPDIR=/tmp/1900.1.extra1.q ==> here it might fork 2 processes TMPDIR=/tmp/1900.1.extra1.q ==> here it might fork 2 processes TMPDIR=/tmp/1900.1.extra.q TMPDIR=/tmp/1900.1.extra.q TMPDIR=/tmp/1900.1.all.q With the scripts mymy.sh: #!/bin/sh cat $PE_HOSTFILE . /usr/sge/default/common/settings.sh qrsh -inherit -V pc15370 ./dummy.sh & qrsh -inherit -V pc15370 ./dummy.sh & qrsh -inherit -V pc15370 ./dummy.sh & qrsh -inherit -V pc15370 ./dummy.sh & wait ./dummy.sh and dummy.sh: #!/bin/sh env | grep TMPDIR sleep 30 When the application don't intend to use forks, but starts exactly one process with each `qrsh -inherit ...`, all seems to be fine and SGE take care to distribute them to the ones from the granted pool, although it can't be predicted which of the `qrsh -inherit ...` will end up in which of the granted queues. ------- Additional comments from reuti Tue Aug 10 04:08:13 -0700 2010 ------- Changed from Defect to Enhancement. ------- Additional comments from reuti Tue Aug 10 13:24:21 -0700 2010 ------- When getting slots from 2 nodes, the last paragraph is false and it's not working again: all tasks may end up in one queue: pc15381 1 all.q@pc15381 UNDEFINED pc15370 1 all.q@pc15370 UNDEFINED pc15381 1 extra.q@pc15381 UNDEFINED pc15370 1 extra.q@pc15370 UNDEFINED TMPDIR=/tmp/1934.1.all.q TMPDIR=/tmp/1934.1.all.q TMPDIR=/tmp/1934.1.all.q TMPDIR=/tmp/1934.1.all.q There should two times /tmp/1934.1.extra.q show up. As it can't be controlled by the application (as -q is not allowed for `qrsh -inherit ...`), SGE should handle it in a proper way.
Change History (0)
Note: See TracTickets for help on using tickets.