[GE users] SGE admin issue

fgarret fgarret at ub.edu
Thu Nov 19 17:34:28 GMT 2009


a)
OpenMPI 1.2.8

b)
see attached.

reuti wrote:
> Am 13.11.2009 um 21:38 schrieb fgarret:
> 
>> Thanks for your quick help.
>>
>> Yes, I've compiled OpenMPI with SGE support.
>> Any idea of what may be the problem?
> 
> a) Which version of OpenMPI?
> 
> b) What does the process tree look like on a slave node of the  
> parallel job:
> 
> ps -e f
> 
> (f w/o -).
> 
> -- Reuti
> 
> 
>> thanks,
>> FG
>>
>> PS - Haven't tried the other issues...
>>
>>
>>
>> reuti wrote:
>>> Hi,
>>>
>>> Am 06.11.2009 um 17:25 schrieb fgarret:
>>>
>>>> I've just installed a cluster with 7 execution nodes (56 cores) +
>>>> an extra node as master. This node
>>>> runs sge_master, has the shared HDD and is the only one with a
>>>> direct connection with the Internet.
>>>> All the others only have connection to the master node. The cluster
>>>> is working pretty ok but I'm
>>>> having some difficulties with some issues:
>>>>
>>>> - Sending mail
>>>> 	I've managed to install sendmail on the master node and tested it
>>>> OK. However, the "-m be -M
>>>> user at host" doesn't work. Who sends the mail on job start/end? The
>>>> master node? submission node?
>>>> execution node? If it is the execution node that sends the emails,
>>>> is there any possibility of being
>>>> the master/submission node?
>>> the exec host will send the emails (the one for the jobs). Some admin
>>> emails are also generated on the master node. So you need to use the
>>> master node as a relay, any maybe change the name of the sender
>>> (which is root at node01 or alike), to the one of the master node, as
>>> many email servers are refusing to accept emails with an unresolvable
>>> sender address.
>>>
>>> Pitfall: root (which is the sender of the emails) won't be
>>> masqueraded by default, there is a default rule which you must  
>>> comment.
>>>
>>> dnl EXPOSED_USER(`root')dnl
>>> define(`SMART_HOST',`smtp:myheadnode.ub.edu')dnl
>>> MASQUERADE_AS(`myheadnode.ub.edu')dnl
>>>
>>> Any reason why you use sendmail, often it's replaced nowadays with
>>> postfix or exim.
>>>
>>>
>>>> - MPI
>>>> 	I've installed OpenMPI and it is also working OK. The only thing
>>>> is that jobs are note removed from
>>>> the queue when they finish. They just stand there eternally and the
>>>> only way to remove them is the
>>>> root user with "qdel -f". Any way to fix this?
>>>>
>>> You compiled OpenMPI with SGE suppport?
>>>
>>>
>>>> - Reserving nodes
>>>> 	When I want to run some job with threads it will occupy one slot
>>>> but will be in fact using more
>>>> processors. Any way to reserve slots?
>>> You will need to create a PE with "allocation_rule $pe_slots" with
>>> name maybe "smp" which you also request in the "qsub -pe smp 4 ..."
>>> and use in the jobscript for the sake of easiness:
>>>
>>> export OMP_NUM_THREADS=$NSLOTS
>>>
>>> -- Reuti
>>>
>>>
>>>> thanks in adv,
>>>> FG
>>>>
>>>> -- 
>>>> Filipe G. Vieira
>>>> Departament de Genetica
>>>> Universitat de Barcelona
>>>> Av. Diagonal, 645
>>>> 08028 Barcelona
>>>> SPAIN
>>>> Phone: +34 934 035 306
>>>> Fax: +34 934 034 420
>>>> fgarret at ub.edu
>>>> http://www.ub.edu/molevol/
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=225402
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=225419
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>>>
>> -- 
>> Filipe G. Vieira
>> Departament de Genetica
>> Universitat de Barcelona
>> Av. Diagonal, 645
>> 08028 Barcelona
>> SPAIN
>> Phone: +34 934 035 306
>> Fax: +34 934 034 420
>> fgarret at ub.edu
>> http://www.ub.edu/molevol/
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=226735
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226878
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 

-- 
Filipe G. Vieira
Departament de Genetica
Universitat de Barcelona
Av. Diagonal, 645
08028 Barcelona
SPAIN
Phone: +34 934 035 306
Fax: +34 934 034 420
fgarret at ub.edu
http://www.ub.edu/molevol/

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=228024

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

    [ Part 2, "ps.slave"  Text/PLAIN (Name: "ps.slave") ~5 KB. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list