[GE users] SGE admin issue

reuti reuti at staff.uni-marburg.de
Sat Nov 14 18:38:28 GMT 2009


Am 13.11.2009 um 21:38 schrieb fgarret:

> Thanks for your quick help.
>
> Yes, I've compiled OpenMPI with SGE support.
> Any idea of what may be the problem?

a) Which version of OpenMPI?

b) What does the process tree look like on a slave node of the  
parallel job:

ps -e f

(f w/o -).

-- Reuti


> thanks,
> FG
>
> PS - Haven't tried the other issues...
>
>
>
> reuti wrote:
>> Hi,
>>
>> Am 06.11.2009 um 17:25 schrieb fgarret:
>>
>>> I've just installed a cluster with 7 execution nodes (56 cores) +
>>> an extra node as master. This node
>>> runs sge_master, has the shared HDD and is the only one with a
>>> direct connection with the Internet.
>>> All the others only have connection to the master node. The cluster
>>> is working pretty ok but I'm
>>> having some difficulties with some issues:
>>>
>>> - Sending mail
>>> 	I've managed to install sendmail on the master node and tested it
>>> OK. However, the "-m be -M
>>> user at host" doesn't work. Who sends the mail on job start/end? The
>>> master node? submission node?
>>> execution node? If it is the execution node that sends the emails,
>>> is there any possibility of being
>>> the master/submission node?
>>
>> the exec host will send the emails (the one for the jobs). Some admin
>> emails are also generated on the master node. So you need to use the
>> master node as a relay, any maybe change the name of the sender
>> (which is root at node01 or alike), to the one of the master node, as
>> many email servers are refusing to accept emails with an unresolvable
>> sender address.
>>
>> Pitfall: root (which is the sender of the emails) won't be
>> masqueraded by default, there is a default rule which you must  
>> comment.
>>
>> dnl EXPOSED_USER(`root')dnl
>> define(`SMART_HOST',`smtp:myheadnode.ub.edu')dnl
>> MASQUERADE_AS(`myheadnode.ub.edu')dnl
>>
>> Any reason why you use sendmail, often it's replaced nowadays with
>> postfix or exim.
>>
>>
>>> - MPI
>>> 	I've installed OpenMPI and it is also working OK. The only thing
>>> is that jobs are note removed from
>>> the queue when they finish. They just stand there eternally and the
>>> only way to remove them is the
>>> root user with "qdel -f". Any way to fix this?
>>>
>>
>> You compiled OpenMPI with SGE suppport?
>>
>>
>>> - Reserving nodes
>>> 	When I want to run some job with threads it will occupy one slot
>>> but will be in fact using more
>>> processors. Any way to reserve slots?
>>
>> You will need to create a PE with "allocation_rule $pe_slots" with
>> name maybe "smp" which you also request in the "qsub -pe smp 4 ..."
>> and use in the jobscript for the sake of easiness:
>>
>> export OMP_NUM_THREADS=$NSLOTS
>>
>> -- Reuti
>>
>>
>>> thanks in adv,
>>> FG
>>>
>>> -- 
>>> Filipe G. Vieira
>>> Departament de Genetica
>>> Universitat de Barcelona
>>> Av. Diagonal, 645
>>> 08028 Barcelona
>>> SPAIN
>>> Phone: +34 934 035 306
>>> Fax: +34 934 034 420
>>> fgarret at ub.edu
>>> http://www.ub.edu/molevol/
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=225402
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=225419
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>
> -- 
> Filipe G. Vieira
> Departament de Genetica
> Universitat de Barcelona
> Av. Diagonal, 645
> 08028 Barcelona
> SPAIN
> Phone: +34 934 035 306
> Fax: +34 934 034 420
> fgarret at ub.edu
> http://www.ub.edu/molevol/
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=226735
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226878

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list