[GE users] Sunsource issue (was: Intel MPI 3.1 / MPICH2 tight integration)

reuti reuti at staff.uni-marburg.de
Sat Nov 8 16:03:48 GMT 2008


Hi,

my attachment was removed and also the thread (I just replied to one  
of my original mails - including the thread of this discussion). As  
compensation my email was obviously replicated - I send always plain  
text emails and not two times the same content in one mail.

-- Reuti


Am 08.11.2008 um 16:40 schrieb reuti:

> NOTE: The following attachments:
>
> mpich2-60.tgz[application/octet-stream;
> 	x-unix-mode=0644;
> 	name=mpich2-60.tgz]
>
>  have been removed from this message because they are not allowed  
> for this discussion.
>
> Hi all,
>
> please find attached a mpich2.tgz which is an update of the package
> provided at sunsource in the MPICH2 Howto. It will now also contain
> the mpd method, and maybe it is also understandable until I update
> the webpage.
>
> I don't know, whether it will fit 1:1 for Intel MPI, as I don't use  
> it.
>
> Technical background: the mpdboot can't be used, as it wouldn't allow
> to fork the qrsh to the slave nodes. It needs the communication back
> via stdout before it can startup. So I launch the mpd's directly.
> This is similar to the daemon based smpd startup. All mpd's are still
> attached to the shepherd. I tested it in 6.2 with the new builtin and
> the traditional qrsh. In case of the traditional qrsh, sometimes the
> MPI program isn't killed (but all Python scripts) if you issue a
> qdel, although the additonal group id is clearly attached. I don't
> know why, as a simple kill afterwards works instantly.
>
> Be sure to set "execd_params     ENABLE_ADDGRP_KILL=TRUE", as the
> processes are still jumping out of the process tree and can only be
> killed by the additonal group id.
>
> -- Reuti
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=88348
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].NOTE: The following attachments:
>
> mpich2-60.tgz[application/octet-stream;
> 	x-unix-mode=0644;
> 	name=mpich2-60.tgz]
>
>  have been removed from this message because they are not allowed  
> for this discussion.
>
> Hi all,
>
> please find attached a mpich2.tgz which is an update of the package
> provided at sunsource in the MPICH2 Howto. It will now also contain
> the mpd method, and maybe it is also understandable until I update
> the webpage.
>
> I don't know, whether it will fit 1:1 for Intel MPI, as I don't use  
> it.
>
> Technical background: the mpdboot can't be used, as it wouldn't allow
> to fork the qrsh to the slave nodes. It needs the communication back
> via stdout before it can startup. So I launch the mpd's directly.
> This is similar to the daemon based smpd startup. All mpd's are still
> attached to the shepherd. I tested it in 6.2 with the new builtin and
> the traditional qrsh. In case of the traditional qrsh, sometimes the
> MPI program isn't killed (but all Python scripts) if you issue a
> qdel, although the additonal group id is clearly attached. I don't
> know why, as a simple kill afterwards works instantly.
>
> Be sure to set "execd_params     ENABLE_ADDGRP_KILL=TRUE", as the
> processes are still jumping out of the process tree and can only be
> killed by the additonal group id.
>
> -- Reuti
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=88348
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88349

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list