[GE users] setting up mpich2 pe + qrsh
reuti at staff.uni-marburg.de
Thu Feb 10 13:23:49 GMT 2005
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Quoting jeroen.m.kleijer at philips.com:
> Though my NIS configuration is correct it wouldn't use the sge_qmaster
> setting provided via services (via NIS) so I had to edit settings.sh and
> unset SGE_QMASTER_PORT
> unset SGE_EXECD_PORT
> SGE_QMASTER_PORT=536 ; export SGE_QMASTER_PORT
> SGE_EXECD_PORT=537 ; export SGE_EXECD_PORT
did you adjusted the nsswitch.conf, so that NIS is also used to get services
from the NIS server?
> The qrsh messages are gone now and I'm a bit further down the road but I
> do have one question left regarding your startmpi.sh script.
> In this script you generate a (random) port number for the smpd processes.
> How do you notify the script which you submit (after SGE has started the
> pe through startmpi.sh) of the randomly generated port number?
> As far as I can tell this variable is not known outside of the startmpi.sh
> script so when I do 'qsub <some script>'
> where <somescript> has the line: mpiexec -np $NPSLOTS -p $SGE_PORTID
> -machinefile $TMPDIR/machines cpi.
> This fails because SGE_PORTID is not known in this script but mpiexec
> needs to know at which port the smpd processes are running.
Well, first of all I wasn't sure, what will happen, when two users will start a
smpd on one node. Maybe a port is selected randomly by smpd on it's own to
avoid conflicts. Therefore I stated, that it's not a complete Howto, since
there are still some gaps in the MPICH2 documentation (the daemonless version
isn't mentioned up to now at all).
But then the problem would be, that one user can't have two jobs in two
different smpd rings on one node. There should two smpds run and listen on
different ports to avoid conflicts between the two jobs. So I got the idea, to
calculate a portnumber from the jobnumber you got. This has to be the same of
course in start_proc_args, the script which uses mpiexec and stop_proc_args.
With SGE_PORTID=$((JOB_ID % 500 + 12000)) you can do it, as long as you don't
have more than a turnaround of jobs of 500. This can be adjusted of course for
a wider range. To be completely on the safe side, there would also be the need
to implement a test on all nodes before, whether the port is free at all.
So, put the calculation of the SGE_PORTID in the script like it's done in the
demo script I supplied, and it shouild work. If you don't like the daemons at
all, you may look at the daemonless startup:
CU - Reuti
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users