[GE users] setting up mpich2 pe + qrsh

jeroen.m.kleijer at philips.com jeroen.m.kleijer at philips.com
Thu Feb 10 14:52:55 GMT 2005


Hi Reuti,

My nsswitch.conf uses for "services: files nis" so it should be able to 
use the NIS file yet somehow it doesn't.

I liked the idea of the smpd daemon mode you described so that was the one 
I wanted to go with (and still do).
I'll try the idea of calculating the SGE_PORTID in the jobscript as well. 
(overlooked that one)

Thanks for all the help so far.

Met vriendelijke groeten / Kind regards

Jeroen Kleijer
Unix Systeembeheer
Philips Applied Technologies









Reuti <reuti at staff.uni-marburg.de>
2005-02-10 02:23 PM
Please respond to users
 
        To:     users at gridengine.sunsource.net
        cc:     (bcc: Jeroen M. Kleijer/EHV/CFT/PHILIPS)
        Subject:        Re: [GE users] setting up mpich2 pe + qrsh
        Classification: 




Hi there,

Quoting jeroen.m.kleijer at philips.com:

<snip>
> Though my NIS configuration is correct it wouldn't use the sge_qmaster 
> setting provided via services (via NIS) so I had to edit settings.sh and 

> adjust:
> unset SGE_QMASTER_PORT
> unset SGE_EXECD_PORT
> to
> SGE_QMASTER_PORT=536 ; export SGE_QMASTER_PORT
> SGE_EXECD_PORT=537 ; export SGE_EXECD_PORT

did you adjusted the nsswitch.conf, so that NIS is also used to get 
services 
from the NIS server?
 
> The qrsh messages are gone now and I'm a bit further down the road but I 

> do have one question left regarding your startmpi.sh script.
> In this script you generate a (random) port number for the smpd 
processes.
> How do you notify the script which you submit (after SGE has started the 

> pe through startmpi.sh) of the randomly generated port number?
> As far as I can tell this variable is not known outside of the 
startmpi.sh 
> script so when I do 'qsub <some script>'
> where <somescript> has the line: mpiexec -np $NPSLOTS -p $SGE_PORTID 
> -machinefile $TMPDIR/machines cpi.
> This fails because SGE_PORTID is not known in this script but mpiexec 
> needs to know at which port the smpd processes are running.

Well, first of all I wasn't sure, what will happen, when two users will 
start a 
smpd on one node. Maybe a port is selected randomly by smpd on it's own to 

avoid conflicts. Therefore I stated, that it's not a complete Howto, since 

there are still some gaps in the MPICH2 documentation (the daemonless 
version 
isn't mentioned up to now at all).

But then the problem would be, that one user can't have two jobs in two 
different smpd rings on one node. There should two smpds run and listen on 

different ports to avoid conflicts between the two jobs. So I got the 
idea, to 
calculate a portnumber from the jobnumber you got. This has to be the same 
of 
course in start_proc_args, the script which uses mpiexec and 
stop_proc_args. 
With SGE_PORTID=$((JOB_ID % 500 + 12000)) you can do it, as long as you 
don't 
have more than a turnaround of jobs of 500. This can be adjusted of course 
for 
a wider range. To be completely on the safe side, there would also be the 
need 
to implement a test on all nodes before, whether the port is free at all.

So, put the calculation of the SGE_PORTID in the script like it's done in 
the 
demo script I supplied, and it shouild work. If you don't like the daemons 
at 
all, you may look at the daemonless startup:

http://gridengine.sunsource.net/servlets/ReadMsg?msgId=23231&listName=users


CU - Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net





More information about the gridengine-users mailing list