[GE users] setting up mpich2 pe + qrsh

Reuti reuti at staff.uni-marburg.de
Thu Feb 10 11:09:38 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

the -inherit will tell the qrsh, that it's running in an already setup job 
environment. This can't be used from the command line to start a job, it's okay 
as it behaves.

The output files, which will be created by your job, i.e. the pe... and po..., 
list the hosts for the parallel job in a proper way by the cat command in the 
start procedure? Are these the correct hostnames?

CU - Reuti

Quoting Jeroen Kleijer <jeroen.kleijer at xs4all.nl>:

> 
> Hi
> 
> I'm running SGE6.0u3.
> This is my first attempt at setting up a p.e. environment so I don't
> have any other parallel applications running with SGE.
> 
> Qrsh works properly. When run by hand it gives me a remote shell but
> running it the same way as in the script by hand gives me an error about
> the JOBID not being set.
> 
> Kind regards,
> 
> Jeroen Kleijer
> 
> On Thu, Feb 10, 2005 at 12:04:07AM +0100, Reuti wrote:
> > Hi,
> > 
> > which SGE version are you using? When you run other parallel applications,
> the 
> > qrsh is working as it should?
> > 
> > CU - Reuti
> > 
> > Quoting Jeroen Kleijer <jeroen.kleijer at xs4all.nl>:
> > 
> > > 
> > > Hi all,
> > > 
> > > I'm setting up an MPICH2 parallel environment with tight integration
> > > according to the hints given by Reuti in post:
> > >
> http://gridengine.sunsource.net/servlets/ReadMsg?msgId=2291&listName=users
> > > 
> > > I compiled mpich2 (with the PGI compiler suite), created a parallel
> > > environment mpich2 which in turn runs the script startmpich2.sh as done
> > > by Reuti. (it had some minor errors in it but these were easily fixed)
> > > 
> > > The problem I'm running into at the moment is that I want to use the
> > > smpd solution provided in the post and thus, the startmpich2.sh script
> > > needs to do a qrsh to every machine in the $machines file and start a
> > > smpd daemon.
> > > 
> > > With every qrsh I run from startmpich2.sh I get the following error:
> > > 
> > > error: getting configuration: unable to send message to qmaster using
> > > port 0 on host "<qmastername>": no valid port number
> > > error:
> > > Cannot get configuration from qmaster
> > > 
> > > The qrsh command in the script looks like this:
> > > $SGE_ROOT/bin/$ARC/qrsh -V -inherit $node "/cadappl/mpich2/1.0/bin/smpd
> > > -s -port $SGE_PORTID"
> > > 
> > > It doesn't really matter what command I use instead of smpd, I've tried
> > > doing a simple mkdir /tmp/$SGE_PORTID and it gave me the same error
> > > message.
> > > 
> > > Has anyone seen this message before?
> > > 
> > > Jeroen Kleijer
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > > 
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list