[GE users] How to find out why the SGE job is not termination.
reuti at staff.uni-marburg.de
Fri Aug 11 22:40:27 BST 2006
Am 11.08.2006 um 00:18 schrieb Amit H Kumar:
> Reuti <reuti at staff.uni-marburg.de> wrote on 08/10/2006 05:24:08 PM:
>>>> what is HPC_unsetmpi.csh doing? I found a similar procedure to set
>>>> these during the startup:
>>> Hi Reuti,
>>> Well what i posted here was basically a cut paste from the link
>>> I did this to make it look simple. Yes it does set and unset.
>>> The set procedure basically boots the MPD ring. The unset procedure
>>> the MPD ring,
>>> and then uses another script to clean any temporary files created.
>> For a tightly integrated job, SGE will cleanup the temporary files.
> No I don't have a tight integration on these machines. But the
> setup_scripts basically uses the $PE_HOSTFILE
> and creates a machinfile to be used by MPDBOOT & MPIEXEC.
the necessary files can easier be created in the startmpi.sh script.
Just at the beginning is the subroutine PeHostfile2MachineFile, which
can be modified to satisfy the needs of the mpd startup method.
>>> Now I am starting to think. Since the cleanup script, regardless
>>> it is
>>> missing or not, is on a temporary user home directory,
>>> which is NFS auto-mounted. And if I remember right the first time I
>>> this job it was hanging in there, but then every run after that
>>> didn't have
>>> any problems. So do you think this could be the problem?
>> Possibly, but:
>> are you requesting an empty PE for this then? I never tried to use it
>> (the mpd startup method), as there is no Tight Integration possible.
>> And two mpds on one node (this could happen, if you have two
>> different jobs there) will also not be easy to implement I think.
> I don't understand what an empty PE means ?
> I request a PE for the mpich2 jobs as (ex. %> qsub -pe mpich2 5
> And then since we want to cleanup some of the non-SGE temporary
> files, we
> use a script to do so.
I was just wondering, whether you are requesting any PE at all (the #
$ -pe was not present in the jobscript), or have just a /bin/true as
start/stop scripts there.
> The cleanup script is called within the sge_submit_script.sh, after
> mpich2 job is finished.
But in case of a job abort it's never called, or you need a proper
setup of the warning signals by the -notify option to qsub. A better
place is to put it in the stopmpi.sh.
> Now I am a little confused:
> I understand that mpiexec will run the jobs depending on the nodes
> and specified in the -machinefile.
> But Now what happens to this cleanup script within the
> Does it run only on the submit_host because I am not using it
> mpiexec. ???
It will only run of the masternode of your parallel job, which is in
fact one of your nodes.
> MY PE CONF:
> qconf -sp mpich2
> pe_name mpich2
> slots 64
> user_lists NONE
> xuser_lists NONE
> start_proc_args /opt/gridengine/mpi/startmpi.sh -catch_rsh
> stop_proc_args /opt/gridengine/mpi/stopmpi.sh
> allocation_rule $fill_up
> control_slaves TRUE
> job_is_first_task FALSE
> urgency_slots min
>> Did you tried the mentioned options in the Howto:
> We are using MPD based startup:
> If I understand right, a tight integration for MPD based startup
> parsing of the $PE_HOSTFILE to prepare a machinefile to boot the
> MPD rings
> and then use the
> same nodes to run your jobs via MPIEXEC. In addition, have the PE
> setup to
> run startmpi.sh and stopmpi.sh as above.
As the mpds fork into deamon land there is no easy tight integration.
AFAIR there was an additonal problem, as you need one mpd per job on
a node, where two mpd jobs are running at the same time - one for
each ring. Otherwise the stopping of the first job might shut down
mpd on nodes, where it is still needed for a second job. So you must
calculate a random portnumber for each job, and start the ring by
qrsh'ing direct a "mpd" on each node and providing this port number,
and not by mpdboot. Also in your mpirun/mpiexec command you must use
this port number to get the correct mpd on all the nodes, in case
that more than one is running there.
Some ideas for this you can get from the mentioned Howto and the
daemonbased smpd startup.
>> I can't say much, as I don't know, what you are doing in your stat-
>> stop scripts. But also if you want to stay with the mpd startup
>> method, I would suggest to put these scripts in the start/stop
>> procedures of the PE, instead putting them in an end-user script.
> In short, Should I make these setup or cleanup scripts part of
> and stopmpi.sh ?
Yes, so a "wrong" script won't leave the nodes in a bad state. The
start/stop scripts can't be changed by the user, and are the best
place to put some critical processing steps inside.
> May be a stupid question: Does stopmpi.sh run on all nodes ?
No, only on the master node of the parallel job.
> More I try to understand, more lost I am.
I would really suggest to have a look in the Howto and use a the
daemonless startup method as a starting point:
Then you can extend this to the daemonbased smpd startup.
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users