[GE users] Intel MPI 3.1 tight integration

Reuti reuti at staff.uni-marburg.de
Tue Nov 4 17:48:23 GMT 2008


Here for the impatient the integration with several daemons even per  
user:

export MPD_CON_EXT="sge_$JOB_ID.$TASK_ID"

before the mpdboot. The qrsh call will also need this variable, so  
the qrsh-wrapper must also get the -V switch to supply this to the  
slave nodes.

I don't know, whether this also applies to Intel MPI. I'll  
investigate further and update the Howto, when I come to a clean  
solution.

-- Reuti


Am 04.11.2008 um 17:19 schrieb Reuti:

> Hi Daniel,
>
> Am 03.11.2008 um 15:42 schrieb Daniel Templeton:
>
>> The mpd daemons do daemonize.  That means that the qrsh -inherit  
>> returns before the actual work gets done, but shouldn't SGE pick  
>> up the usage anyway using the GID?
>
> you are right, the GID is indeed still attached to the mpd and when  
> ENABLE_ADDGRP_KILL is set, also the daemon is gone after a job.
>
> But there are some limitations:
>
> a) besides ru_wallclock, all ru_* entries in the accounting records  
> are missing, as the process was detached from the shepherd.
>
> b) having two jobs of the same user on a node, the second job will  
> kill the first mpd.py instances but leaving the binaries running  
> when you start a mpd.py by mpdboot per job.
>
> There is only one entry in /tmp for each user. E.g. in LAM/MPI they  
> added special strings containing "sge" and the $JOB_ID to have  
> dedicated directories per job, when they discover that they are  
> running under SGE and need a daemon per job.
>
> c) As a result of b), the first started job can only have one  
> mpirun/mpiexec, as the mpd.py for this one is gone. Further tasks  
> would be created as kids of the mpd.py of the second job, giving  
> wrong accounting. Furthermore, the ending 2nd job will also remove  
> the task of the second step of the first job.
>
> What would be necessary, would be a dedicated port per mpd.py to  
> connect to the right mpd.py of this job, and the mpd.py not forking  
> into daemon land.
>
> For me, these are still too many limitations to include this  
> startup method in the Howto. I can try to ask the MPICH(2) team and  
> Intel, whether they could supply any solution.
>
> -- Reuti
>
>
>>   I readily admit that SGE PEs are not my strong suit.
>>
>> There is a switch to make the mpd daemons not daemonize, but then  
>> you have to do some dancing around how to let mpdboot run multiple  
>> qrsh -inherit calls in the background and still be able to read  
>> the first line of input from them (the port number) without having  
>> input buffering get in the way.
>>
>> Daniel
>>
>> Reuti wrote:
>>> Am 03.11.2008 um 14:54 schrieb Daniel Templeton:
>>>
>>>> Actually, I've done a tight integration, and it's pretty easy.   
>>>> The mpdboot command takes a -r parameter that gives the name of  
>>>> the "rsh" to execute.  Just create a script that strips out the - 
>>>> x and -n from the arguments and runs qrsh -inherit instead of  
>>>> rsh, and pass that script to mpdboot with -r.  (You may also  
>>>> want to shortcut out the Python version check...)  You'll also  
>>>> need a PE starter that creates an appropriate machines file.
>>>
>>> In contrast to MPICH(2) the mpd daemons are not forking into  
>>> daemonland any longer? Besides this, I found the creation of more  
>>> and more processgroups by the Python script in MPICH(2) being the  
>>> handicap.
>>>
>>> Is it also working with two jobs of the same user on a node?
>>>
>>> No shutdown necessary?
>>>
>>> -- Reuti
>>>
>>>
>>>> My scripts below should work with Intel MPI 3.1 or 3.2.
>>>>
>>>> Daniel
>>>>
>>>> % cat startpe.sh
>>>> #!/bin/sh
>>>>
>>>> hfile=$TMP/mpd.hosts
>>>> touch $hfile
>>>>
>>>> cat $PE_HOSTFILE | while read line; do
>>>>  host=`echo $line | cut -d' ' -f1 | cut -d'.' -f1`
>>>>  cores=`echo $line | cut -d' ' -f2`
>>>>
>>>>  while [ $cores -gt 0 ]; do
>>>>    echo $host >> $hfile
>>>>    cores=`expr $cores - 1`
>>>>  done
>>>> done
>>>>
>>>> exit 0
>>>> % cat qrsh-inherit.pl
>>>> #!/usr/bin/perl
>>>>
>>>> # Shortcircuit python version check
>>>> if (grep /^\s*-x\s*$/, @ARGV) {
>>>>  print "2.4\n";
>>>>  exit 0;
>>>> }
>>>>
>>>> # Strip out -n and -x
>>>> @ARGV = grep !/^\s*-[nx]\s*$/, @ARGV;
>>>>
>>>> exec "qrsh", "-inherit", @ARGV;
>>>>
>>>>
>>>> Daniel De Marco wrote:
>>>>> Hi,
>>>>>
>>>>> I'm trying to integrate Intel MPI with gridengine. From what I  
>>>>> found on
>>>>> the list archives it seems tight integration is impossible.  
>>>>> What about loose intergation, did anyone try it? Any comments/ 
>>>>> pointers?
>>>>>
>>>>> Thanks, Daniel.
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list