[GE users] Intel MPI 3.1 / MPICH2 tight integration

reuti reuti at staff.uni-marburg.de
Sat Nov 8 15:40:56 GMT 2008


NOTE: The following attachments:

mpich2-60.tgz[application/octet-stream;
	x-unix-mode=0644;
	name=mpich2-60.tgz] 

 have been removed from this message because they are not allowed for this discussion.

Hi all,

please find attached a mpich2.tgz which is an update of the package  
provided at sunsource in the MPICH2 Howto. It will now also contain  
the mpd method, and maybe it is also understandable until I update  
the webpage.

I don't know, whether it will fit 1:1 for Intel MPI, as I don't use it.

Technical background: the mpdboot can't be used, as it wouldn't allow  
to fork the qrsh to the slave nodes. It needs the communication back  
via stdout before it can startup. So I launch the mpd's directly.  
This is similar to the daemon based smpd startup. All mpd's are still  
attached to the shepherd. I tested it in 6.2 with the new builtin and  
the traditional qrsh. In case of the traditional qrsh, sometimes the  
MPI program isn't killed (but all Python scripts) if you issue a  
qdel, although the additonal group id is clearly attached. I don't  
know why, as a simple kill afterwards works instantly.

Be sure to set "execd_params     ENABLE_ADDGRP_KILL=TRUE", as the  
processes are still jumping out of the process tree and can only be  
killed by the additonal group id.

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88348

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

    [ Part 2: "Attached Text" ]

NOTE: The following attachments:

mpich2-60.tgz[application/octet-stream;
	x-unix-mode=0644;
	name=mpich2-60.tgz] 

 have been removed from this message because they are not allowed for this discussion.

Hi all,

please find attached a mpich2.tgz which is an update of the package  
provided at sunsource in the MPICH2 Howto. It will now also contain  
the mpd method, and maybe it is also understandable until I update  
the webpage.

I don't know, whether it will fit 1:1 for Intel MPI, as I don't use it.

Technical background: the mpdboot can't be used, as it wouldn't allow  
to fork the qrsh to the slave nodes. It needs the communication back  
via stdout before it can startup. So I launch the mpd's directly.  
This is similar to the daemon based smpd startup. All mpd's are still  
attached to the shepherd. I tested it in 6.2 with the new builtin and  
the traditional qrsh. In case of the traditional qrsh, sometimes the  
MPI program isn't killed (but all Python scripts) if you issue a  
qdel, although the additonal group id is clearly attached. I don't  
know why, as a simple kill afterwards works instantly.

Be sure to set "execd_params     ENABLE_ADDGRP_KILL=TRUE", as the  
processes are still jumping out of the process tree and can only be  
killed by the additonal group id.

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88348

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list