[GE users] SGE and physical machine slot allocation

lukacm at pdx.edu lukacm at pdx.edu
Wed Apr 19 22:30:28 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello,

yes the job is running fine, but not as SGE scheduled it on the physical
machines, i.e. parallel slots.

the qsub command looks like qsub -pe mpich 4 mbsub.sh

inside the main important flags are

#$ -v P4_RSHCOMMAND=ssh
#$ -v P4_GLOBMEMSIZE=10000000
#$ -v MPICH_PROCESS_GROUP=no
#$ -v CONV_RSH=ssh

I also did the tight integration of MPICH and SGE using the method number 2.

In general i would not mind this issue, but when i have to clean a set of
zombies from the same user, and i do not know which processes are zombies and
which not, it makes a problem.

martin

Quoting Reuti <reuti at staff.uni-marburg.de>:

> Hi,
>
> Am 19.04.2006 um 21:59 schrieb lukacm at pdx.edu:
>
> > Hello all,
> >
> > a job run with SGE generates the following strangeness.
> >
> > ----------------------------------------------------------------------
> > ------
> > arc.q at compute-0-11.local       BIPC  2/2       1.00     lx26-amd64
> >    3964 0.55500 tas        ruedas       r     04/19/2006
> > 10:50:59     2
> > ----------------------------------------------------------------------
> > ------
> > arc.q at compute-0-12.local       BIPC  1/2       0.00     lx26-amd64
> >    3964 0.55500 tas        ruedas       r     04/19/2006
> > 10:50:59     1
> > ----------------------------------------------------------------------
> > ------
> >
> > The slots allocated by SGE do not correspond to the queues that are
> > shown by
> > qstat. Is there a rememdy to tight integrate SGE to the physical
> > machines?
>
> this seems not to be a problem of SGE, but of the integration of your
> parallel job into SGE. So this job got three slots, but is only using
> one slot according to the load you mean?
>
> What is your defined queue, PE, the defined scripts for this PE and
> your qsub command?
>
> Is your job instead running on other nodes than the intended ones?
>
> -- Reuti
>
>
> > thank you
> >
> >
> > martin
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list