[GE users] setting up mpich2 pe + qrsh

jeroen.m.kleijer at philips.com jeroen.m.kleijer at philips.com
Thu Feb 10 14:52:55 GMT 2005

Hi Reuti,

My nsswitch.conf uses for "services: files nis" so it should be able to 
use the NIS file yet somehow it doesn't.

I liked the idea of the smpd daemon mode you described so that was the one 
I wanted to go with (and still do).
I'll try the idea of calculating the SGE_PORTID in the jobscript as well. 
(overlooked that one)

Thanks for all the help so far.

Met vriendelijke groeten / Kind regards

Jeroen Kleijer
Unix Systeembeheer
Philips Applied Technologies

Reuti <reuti at staff.uni-marburg.de>
2005-02-10 02:23 PM
Please respond to users
        To:     users at gridengine.sunsource.net
        cc:     (bcc: Jeroen M. Kleijer/EHV/CFT/PHILIPS)
        Subject:        Re: [GE users] setting up mpich2 pe + qrsh

Hi there,

Quoting jeroen.m.kleijer at philips.com:

> Though my NIS configuration is correct it wouldn't use the sge_qmaster 
> setting provided via services (via NIS) so I had to edit settings.sh and 

> adjust:
> to

did you adjusted the nsswitch.conf, so that NIS is also used to get 
from the NIS server?
> The qrsh messages are gone now and I'm a bit further down the road but I 

> do have one question left regarding your startmpi.sh script.
> In this script you generate a (random) port number for the smpd 
> How do you notify the script which you submit (after SGE has started the 

> pe through startmpi.sh) of the randomly generated port number?
> As far as I can tell this variable is not known outside of the 
> script so when I do 'qsub <some script>'
> where <somescript> has the line: mpiexec -np $NPSLOTS -p $SGE_PORTID 
> -machinefile $TMPDIR/machines cpi.
> This fails because SGE_PORTID is not known in this script but mpiexec 
> needs to know at which port the smpd processes are running.

Well, first of all I wasn't sure, what will happen, when two users will 
start a 
smpd on one node. Maybe a port is selected randomly by smpd on it's own to 

avoid conflicts. Therefore I stated, that it's not a complete Howto, since 

there are still some gaps in the MPICH2 documentation (the daemonless 
isn't mentioned up to now at all).

But then the problem would be, that one user can't have two jobs in two 
different smpd rings on one node. There should two smpds run and listen on 

different ports to avoid conflicts between the two jobs. So I got the 
idea, to 
calculate a portnumber from the jobnumber you got. This has to be the same 
course in start_proc_args, the script which uses mpiexec and 
With SGE_PORTID=$((JOB_ID % 500 + 12000)) you can do it, as long as you 
have more than a turnaround of jobs of 500. This can be adjusted of course 
a wider range. To be completely on the safe side, there would also be the 
to implement a test on all nodes before, whether the port is free at all.

So, put the calculation of the SGE_PORTID in the script like it's done in 
demo script I supplied, and it shouild work. If you don't like the daemons 
all, you may look at the daemonless startup:


CU - Reuti

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list