[GE users] mpich2_smpd - sge - solaris

Reuti reuti at staff.uni-marburg.de
Wed Sep 3 16:29:27 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Am 03.09.2008 um 17:20 schrieb Yann JOBIC:

> Reuti a écrit :
>> Hi,
>>
>> Am 03.09.2008 um 14:03 schrieb Yann JOBIC:
>>
>>> I used the great howto for a tight integration of mpich2 and sge  
>>> made by reuti :
>>> http://gridengine.sunsource.net/howto/mpich2-integration/mpich2- 
>>> integration.html
>>>
>>> I'm using solaris 10, x86 and sparc. The job are correctly  
>>> spawned on 4 nodes :
>>>
>>> Just on one node :
>>> Sara06-jobic% ptree 21175
>>> 455   /opt/sge/bin/sol-amd64/sge_execd
>>>  21169 sge_shepherd-2576 -bg
>>>    21170 /opt/sge/utilbin/sol-amd64/rshd -l
>>>      21171 /opt/sge/utilbin/sol-amd64/qrsh_starter /opt/sge/Huit/ 
>>> spool/Sara06/active_jobs/
>>>        21172 tcsh -c /opt/lib/mpich2/bin/smpd -port 22576 -d 0
>>>          21173 /opt/lib/mpich2/bin/smpd -port 22576 -d 0
>>>            21174 /opt/lib/mpich2/bin/smpd -port 22576 -d 0
>>>              21175 /home/jobic/sge/exemple/./hello
>>>
>>> However, when the job is finished, there's still the smpd running :
>>> Sara06-jobic% ptree 21171
>>> 455   /opt/sge/bin/sol-amd64/sge_execd
>>>  21169 sge_shepherd-2576 -bg
>>>    21170 /opt/sge/utilbin/sol-amd64/rshd -l
>>>      21171 /opt/sge/utilbin/sol-amd64/qrsh_starter /opt/sge/Huit/ 
>>> spool/Sara06/active_jobs/
>>>        21172 tcsh -c /opt/lib/mpich2/bin/smpd -port 22576 -d 0
>>>          21173 /opt/lib/mpich2/bin/smpd -port 22576 -d 0
>>>
>>> With a qdel, i can delete them :
>>>  2576  4 mpich2_test               jobic      09/03/2008 13:38:00  
>>> Huit       (stalled)
>>> It's just taking some time.
>>
>> you mean, the jobscript is in some way halted? If it finishes, it  
>> should call the defined stop_proc_args of the PE to shut down the  
>> daemons. You defined the stop-proc-args also in the outlined way?
>>
>> -- Reuti
>>
> Thanks for the fast answer.
>
> I defined this for the pe :
> homard-jobic% qconf -sp mpich2_smpd
> pe_name           mpich2_smpd
> slots             56
> user_lists        NONE
> xuser_lists       NONE
> start_proc_args   /opt/sge/mpich2_smpd/startmpich2.sh -catch_rsh  
> $pe_hostfile \
>                  /opt/lib/mpich2
> stop_proc_args    /opt/sge/mpich2_smpd/stopmpich2.sh -catch_rsh / 
> opt/lib/mpich2
> allocation_rule   $round_robin
> control_slaves    TRUE
> job_is_first_task FALSE
> urgency_slots     min
>
> I found in the error file the line :
> /opt/sge/mpich2_smpd/stopmpich2.sh: line 126: tac: command not found

it will just list a file in reverse order (cat <-> tac) , as the  
shutdown of the daemons should shut down the one on the master node  
of the parallel job at last.

You could install it from: http://directory.fsf.org/project/ 
textutils/ I hope it will compile on Solaris.

There is a nice Howto at IBM about these tools: http://www.ibm.com/ 
developerworks/edu/l-dw-linux-gnutex-i.html (you have to register,  
but it's free).

-- Reuti


> It should come from here. How can i fix it ?
>
> Many thanks,
>
> Yann
>
>
> -- 
> ___________________________
>
> Yann JOBIC
> HPC engineer
> Polytech Marseille DME
> IUSTI-CNRS UMR 6595
> Technopole de Chateau Gombert
> 5 rue Enrico Fermi
> 13453 Marseille cedex 13
> Tel : (33) 4 91 10 69 39
>  ou  (33) 4 91 10 69 43
> Fax : (33) 4 91 10 69 69
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list