[GE issues] [Issue 3278] New - PE entry "job_is_first_task" lowers number of tasks on slave nodes

reuti reuti at staff.uni-marburg.de
Tue Aug 10 21:02:12 BST 2010


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3278
                 Issue #|3278
                 Summary|PE entry "job_is_first_task" lowers number of tasks on
                        | slave nodes
               Component|gridengine
                 Version|6.2u5
                Platform|All
                     URL|
              OS/Version|All
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P3
            Subcomponent|kernel
             Assigned to|andreas
             Reported by|reuti






------- Additional comments from reuti at sunsource.net Tue Aug 10 13:02:08 -0700 2010 -------
Having a slot distribution for a parallel job which get slots from two (possibly more) queues, the entry "job_is_first_task" will
erroneously limit the number of processes allowed on slave nodes. It should only reduce the number of slots on the master node of the
parallel job, i.e. whether a local `qrsh` more or less is allowed.

Observed behavior:

$ qsub -pe openmpi 4 job.sh

will get a PE_HOSTFILE:

pc15381 1 all.q at pc15381 UNDEFINED
pc15370 1 all.q at pc15370 UNDEFINED
pc15381 1 extra.q at pc15381 UNDEFINED
pc15370 1 extra.q at pc15370 UNDEFINED

i.e. the job script is running on pc15381. It should be able to make two `qrsh -inherit ...` calls to pc15370. But instead the output is:

error: executing task of job 1932 failed: execution daemon on host "pc15370" didn't accept task

Changing "job_is_first_task" in the PE to "false" solves, the issue. But as a slave node is targeted, this shouldn't have any influence.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=273596

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list