[GE users] setting up mpich2 pe + qrsh
jeroen.m.kleijer at philips.com
jeroen.m.kleijer at philips.com
Thu Feb 10 14:52:55 GMT 2005
My nsswitch.conf uses for "services: files nis" so it should be able to
use the NIS file yet somehow it doesn't.
I liked the idea of the smpd daemon mode you described so that was the one
I wanted to go with (and still do).
I'll try the idea of calculating the SGE_PORTID in the jobscript as well.
(overlooked that one)
Thanks for all the help so far.
Met vriendelijke groeten / Kind regards
Philips Applied Technologies
Reuti <reuti at staff.uni-marburg.de>
2005-02-10 02:23 PM
Please respond to users
To: users at gridengine.sunsource.net
cc: (bcc: Jeroen M. Kleijer/EHV/CFT/PHILIPS)
Subject: Re: [GE users] setting up mpich2 pe + qrsh
Quoting jeroen.m.kleijer at philips.com:
> Though my NIS configuration is correct it wouldn't use the sge_qmaster
> setting provided via services (via NIS) so I had to edit settings.sh and
> unset SGE_QMASTER_PORT
> unset SGE_EXECD_PORT
> SGE_QMASTER_PORT=536 ; export SGE_QMASTER_PORT
> SGE_EXECD_PORT=537 ; export SGE_EXECD_PORT
did you adjusted the nsswitch.conf, so that NIS is also used to get
from the NIS server?
> The qrsh messages are gone now and I'm a bit further down the road but I
> do have one question left regarding your startmpi.sh script.
> In this script you generate a (random) port number for the smpd
> How do you notify the script which you submit (after SGE has started the
> pe through startmpi.sh) of the randomly generated port number?
> As far as I can tell this variable is not known outside of the
> script so when I do 'qsub <some script>'
> where <somescript> has the line: mpiexec -np $NPSLOTS -p $SGE_PORTID
> -machinefile $TMPDIR/machines cpi.
> This fails because SGE_PORTID is not known in this script but mpiexec
> needs to know at which port the smpd processes are running.
Well, first of all I wasn't sure, what will happen, when two users will
smpd on one node. Maybe a port is selected randomly by smpd on it's own to
avoid conflicts. Therefore I stated, that it's not a complete Howto, since
there are still some gaps in the MPICH2 documentation (the daemonless
isn't mentioned up to now at all).
But then the problem would be, that one user can't have two jobs in two
different smpd rings on one node. There should two smpds run and listen on
different ports to avoid conflicts between the two jobs. So I got the
calculate a portnumber from the jobnumber you got. This has to be the same
course in start_proc_args, the script which uses mpiexec and
With SGE_PORTID=$((JOB_ID % 500 + 12000)) you can do it, as long as you
have more than a turnaround of jobs of 500. This can be adjusted of course
a wider range. To be completely on the safe side, there would also be the
to implement a test on all nodes before, whether the port is free at all.
So, put the calculation of the SGE_PORTID in the script like it's done in
demo script I supplied, and it shouild work. If you don't like the daemons
all, you may look at the daemonless startup:
CU - Reuti
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users