[GE users] Is there a way to run prolog scripts (as root) in all machines involved in the Parallel task?

reuti reuti at staff.uni-marburg.de
Thu Mar 11 18:43:15 GMT 2010


Am 11.03.2010 um 19:18 schrieb goncalo:

> Hi Reuti
>
>> why do you want to do so? Open MPI is tighly integrated into SGE  
>> as long as you compile it --with-sge in the configure step. SGE  
>> would disallow a `qrsh -inherit ...` to a machine not in the  
>> granted list of machines. Whether it uses -builtin-, rsh, ssh or  
>> tight-ssh as startup method doesn't matter.
>
> I'm using the SGE tight integration:
>
>     [root at hpc001 ~]# /usr/mpi/gcc/openmpi-1.2.8/bin/ompi_info |  
> grep gridengine
>                      MCA ras: gridengine (MCA v1.0, API v1.3,  
> Component v1.2.8)
>                      MCA pls: gridengine (MCA v1.0, API v1.3,  
> Component v1.2.8)
>
> However, in the link
>
>     http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
>
> it is stated:
>
>     "Specifically, if you execute an mpirun command in a SGE job,  
> it will automatically use the SGE mechanisms to launch and kill  
> processes."
>
> What do they mean with "SGE mechanisms to launch and kill  
> processes"? Do I have to have qrsh working? I've seen that I can  
> use qrsh working either with rsh (which is very insecure and I do  
> not like it)

depends; often the nodes are on a private subnet which is no  
reachable form the outside anyway, so rsh could be used. As it's only  
between nodes, it doesn't matter. As `qrsh -inherit ...` will start a  
rshd/sshd for each slave task, it's also possible to disable rshd/ 
sshd in /etc/xinetd.d completely. Or (what I prefer) limit it to  
admin staff. The sshd definition for qrsh_daemon will then need an  
addititonal parameter which points to a different sshd_config to  
allow this to users.


> , or with ssh (but the accounting is not correctly done

Correct, unless you compile SGE on your own (which you don't use  
afterwards but the built sshd).


> ). How do you guys deal with this issue?
>
>> In the prolog you can use a loop across the list of machines to do  
>> it on your own though.
>
> I though about that, but that would mean that I would have to allow  
> root ssh to all hosts, and this is something I wouldn't like...

You can run a second sshd all the time which a) allows login only for  
root, b) uses a different port, c) allows only login with ssh keys.  
This approach I use to shutdown some machine when apcupsd detects  
that the UPSes are running out of battery (with 4 UPSes supplied by 4  
different mains feeding different parts of the cluster the network  
option apcupsd offers would mean to put the logic "when" to shutdown  
"what" on each node).

-- Reuti


>
> Cheers
> Goncalo

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248053

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list