[GE users] setting up mpich2 pe + qrsh

Reuti reuti at staff.uni-marburg.de
Thu Feb 10 16:51:01 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Jeroen,

the scripts were built for the bash, are you using ksh? Changing 
shell_start_mode for the queues to unix_behavior might resolve this issue and 
explain the typos you found.

Quoting jeroen.m.kleijer at philips.com:

> Hi Reuti,
> 
> Examining the startmpi.sh script to fire I noticed that you want to use 
> qrsh to start smpd on every node of from the $pe_hostfile.
> It took me a while to figure out why my setup wouldn't work but when you 
> use qrsh, it doesn't allow you to select a host to start your command so 
> the command:
> qrsh -V -inherit $node "command"
> gives the error message
> 'ksh: $node not found' because apparently we're trying to start the 
> command $node on a (interactive) host selected by SGE instead of "command" 
> on $node.

If you secify -inherit, a hostname is allowed. Please have a look at the 
-inherit option on the qsub man page. So the $node is not resolved as it should 
be - you really see the printed $ in the output? Can you echo it also and check 
in the .pe and .po files?.

> A possibility would be to do:
> qrsh -V -inherit -q batch.q@$node "command"
> but this would mean that I would have to open the batch.q queue for 
> interactive sessions, something I'm not looking forward to.

At this point you got already the queue, since your job is already running. No 
need to specify it.

Cheers - Reuti



> Is there a possibility to do a qrsh command directly to a specified node? 
> (and thereby defeating the purpose of SGE scheduling, I know)
> Or do I still have to do a "regular" rsh command, also smething I'm not 
> looking forward to.
> 
> Met vriendelijke groeten / Kind regards
> 
> Jeroen Kleijer
> Unix Systeembeheer
> Philips Applied Technologies
> 
> 
> 
> 
> 
> 
> 
> 
> 
> jeroen.m.kleijer+FromInterNet at philips.com
> 2005-02-10 03:52 PM
> Please respond to users
>  
>         To:     users at gridengine.sunsource.net
>         cc:     (bcc: Jeroen M. Kleijer/EHV/CFT/PHILIPS)
>         Subject:        Re: [GE users] setting up mpich2 pe + qrsh
>         Classification: 
> 
> 
> 
> 
> 
> Hi Reuti, 
> 
> My nsswitch.conf uses for "services: files nis" so it should be able to 
> use the NIS file yet somehow it doesn't. 
> 
> I liked the idea of the smpd daemon mode you described so that was the one 
> I wanted to go with (and still do). 
> I'll try the idea of calculating the SGE_PORTID in the jobscript as well. 
> (overlooked that one) 
> 
> Thanks for all the help so far. 
> 
> Met vriendelijke groeten / Kind regards
> 
> Jeroen Kleijer
> Unix Systeembeheer
> Philips Applied Technologies 
> 
> 
> 
> 
> 
> 
> 
> 
> Reuti <reuti at staff.uni-marburg.de> 
> 2005-02-10 02:23 PM 
> Please respond to users 
>         
>         To:        users at gridengine.sunsource.net 
>         cc:        (bcc: Jeroen M. Kleijer/EHV/CFT/PHILIPS) 
>         Subject:        Re: [GE users] setting up mpich2 pe + qrsh 
>         Classification:         
> 
> 
> 
> 
> Hi there,
> 
> Quoting jeroen.m.kleijer at philips.com:
> 
> <snip>
> > Though my NIS configuration is correct it wouldn't use the sge_qmaster 
> > setting provided via services (via NIS) so I had to edit settings.sh and 
> 
> > adjust:
> > unset SGE_QMASTER_PORT
> > unset SGE_EXECD_PORT
> > to
> > SGE_QMASTER_PORT=536 ; export SGE_QMASTER_PORT
> > SGE_EXECD_PORT=537 ; export SGE_EXECD_PORT
> 
> did you adjusted the nsswitch.conf, so that NIS is also used to get 
> services 
> from the NIS server?
> 
> > The qrsh messages are gone now and I'm a bit further down the road but I 
> 
> > do have one question left regarding your startmpi.sh script.
> > In this script you generate a (random) port number for the smpd 
> processes.
> > How do you notify the script which you submit (after SGE has started the 
> 
> > pe through startmpi.sh) of the randomly generated port number?
> > As far as I can tell this variable is not known outside of the 
> startmpi.sh 
> > script so when I do 'qsub <some script>'
> > where <somescript> has the line: mpiexec -np $NPSLOTS -p $SGE_PORTID 
> > -machinefile $TMPDIR/machines cpi.
> > This fails because SGE_PORTID is not known in this script but mpiexec 
> > needs to know at which port the smpd processes are running.
> 
> Well, first of all I wasn't sure, what will happen, when two users will 
> start a 
> smpd on one node. Maybe a port is selected randomly by smpd on it's own to 
> 
> avoid conflicts. Therefore I stated, that it's not a complete Howto, since 
> 
> there are still some gaps in the MPICH2 documentation (the daemonless 
> version 
> isn't mentioned up to now at all).
> 
> But then the problem would be, that one user can't have two jobs in two 
> different smpd rings on one node. There should two smpds run and listen on 
> 
> different ports to avoid conflicts between the two jobs. So I got the 
> idea, to 
> calculate a portnumber from the jobnumber you got. This has to be the same 
> of 
> course in start_proc_args, the script which uses mpiexec and 
> stop_proc_args. 
> With SGE_PORTID=$((JOB_ID % 500 + 12000)) you can do it, as long as you 
> don't 
> have more than a turnaround of jobs of 500. This can be adjusted of course 
> for 
> a wider range. To be completely on the safe side, there would also be the 
> need 
> to implement a test on all nodes before, whether the port is free at all.
> 
> So, put the calculation of the SGE_PORTID in the script like it's done in 
> the 
> demo script I supplied, and it shouild work. If you don't like the daemons 
> at 
> all, you may look at the daemonless startup:
> 
> http://gridengine.sunsource.net/servlets/ReadMsg?msgId=23231&listName=users
> 
> 
> CU - Reuti
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list