[GE users] How to find out why the SGE job is not termination.

Reuti reuti at staff.uni-marburg.de
Fri Aug 11 22:40:27 BST 2006


Hi again,

Am 11.08.2006 um 00:18 schrieb Amit H Kumar:

>
>
> Reuti <reuti at staff.uni-marburg.de> wrote on 08/10/2006 05:24:08 PM:
>
>>>>>
>>>>
>>>> what is HPC_unsetmpi.csh doing? I found a similar procedure to set
>>>> these during the startup:
>>> Hi Reuti,
>>>
>>> Well what i posted here was basically a cut paste from the link  
>>> below.
>>> I did this to make it look simple. Yes it does set and unset.
>>>
>>> The set procedure basically boots the MPD ring. The unset procedure
>>> exits
>>> the MPD ring,
>>> and then uses another script to clean any temporary  files created.
>>
>> For a tightly integrated job, SGE will cleanup the temporary files.
>
> No I don't have a tight integration on these machines. But the
> setup_scripts basically uses the $PE_HOSTFILE
> and creates a machinfile to be used by MPDBOOT & MPIEXEC.
>

the necessary files can easier be created in the startmpi.sh script.  
Just at the beginning is the subroutine PeHostfile2MachineFile, which  
can be modified to satisfy the needs of the mpd startup method.

>>
>>>
>>> Now I am starting to think. Since the cleanup script, regardless  
>>> it is
>>> missing or not, is on a temporary user home directory,
>>> which is NFS auto-mounted. And if I remember right the first time I
>>> ran
>>> this job it was hanging in there, but then every run after that
>>> didn't have
>>> any problems. So do you think this could be the problem?
>>
>> Possibly, but:
>>
>> are you requesting an empty PE for this then? I never tried to use it
>> (the mpd startup method), as there is no Tight Integration possible.
>> And two mpds on one node (this could happen, if you have two
>> different jobs there) will also not be easy to implement I think.
>>
> Okay.
> I don't understand what an empty PE means ?
> I request a PE for the mpich2 jobs as (ex. %> qsub -pe mpich2 5
> ./sge_submit_script.sh)
> And then since we want to cleanup some of the non-SGE temporary  
> files, we
> use a script to do so.

I was just wondering, whether you are requesting any PE at all (the # 
$ -pe was not present in the jobscript), or have just a /bin/true as  
start/stop scripts there.

> The cleanup script is called within the sge_submit_script.sh, after  
> the
> mpich2 job is finished.
>

But in case of a job abort it's never called, or you need a proper  
setup of the warning signals by the -notify option to qsub. A better  
place is to put it in the stopmpi.sh.

>
>
> Now I am a little confused:
> I understand that mpiexec will run the jobs depending on the nodes  
> selected
> and specified in the -machinefile.
> But Now what happens to this cleanup script within the
> sge_submit_script.sh:
>      Does it run only on the submit_host because  I am not using it  
> via
> mpiexec. ???

It will only run of the masternode of your parallel job, which is in  
fact one of your nodes.

>
>
> MY PE CONF:
> ============
> qconf -sp mpich2
> pe_name           mpich2
> slots             64
> user_lists        NONE
> xuser_lists       NONE
> start_proc_args   /opt/gridengine/mpi/startmpi.sh -catch_rsh  
> $pe_hostfile
> stop_proc_args    /opt/gridengine/mpi/stopmpi.sh
> allocation_rule   $fill_up
> control_slaves    TRUE
> job_is_first_task FALSE
> urgency_slots     min
>
>
>
>> Did you tried the mentioned options in the Howto:
>>
>> http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-
>> integration.html
>
> We are using MPD based startup:
> If I understand right, a tight integration for MPD based startup  
> method
> involves,
> parsing of the $PE_HOSTFILE to prepare a machinefile to boot the  
> MPD rings
> and then use the
> same nodes to run your jobs via MPIEXEC. In addition, have the PE  
> setup to
> run startmpi.sh and stopmpi.sh as above.
>

As the mpds fork into deamon land there is no easy tight integration.  
AFAIR there was an additonal problem, as you need one mpd per job on  
a node, where two mpd jobs are running at the same time - one for  
each ring. Otherwise the stopping of the first job might shut down  
mpd on nodes, where it is still needed for a second job. So you must  
calculate a random portnumber for each job, and start the ring by  
qrsh'ing direct a "mpd" on each node and providing this port number,  
and not by mpdboot. Also in your mpirun/mpiexec command you must use  
this port number to get the correct mpd on all the nodes, in case  
that more than one is running there.

Some ideas for this you can get from the mentioned Howto and the  
daemonbased smpd startup.

>
>>
>> I can't say much, as I don't know, what you are doing in your stat-
>> stop scripts. But also if you want to stay with the mpd startup
>> method, I would suggest to put these scripts in the start/stop
>> procedures of the PE, instead putting them in an end-user script.
>
> In short, Should I make these setup or cleanup scripts part of  
> starmpi.sh
> and stopmpi.sh ?

Yes, so a "wrong" script won't leave the nodes in a bad state. The  
start/stop scripts can't be changed by the user, and are the best  
place to put some critical processing steps inside.

> May be a stupid question: Does stopmpi.sh run on all nodes ?

No, only on the master node of the parallel job.

> More I try to understand, more lost I am.

I would really suggest to have a look in the Howto and use a the  
daemonless startup method as a starting point:

http://gridengine.sunsource.net/howto/mpich2-integration/mpich2- 
integration.html

Then you can extend this to the daemonbased smpd startup.

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list