[GE users] mpich2_smpd - sge - solaris

Yann JOBIC jobic at polytech.univ-mrs.fr
Wed Sep 3 16:20:18 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti a écrit :
> Hi,
>
> Am 03.09.2008 um 14:03 schrieb Yann JOBIC:
>
>> I used the great howto for a tight integration of mpich2 and sge made 
>> by reuti :
>> http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html 
>>
>>
>> I'm using solaris 10, x86 and sparc. The job are correctly spawned on 
>> 4 nodes :
>>
>> Just on one node :
>> Sara06-jobic% ptree 21175
>> 455   /opt/sge/bin/sol-amd64/sge_execd
>>  21169 sge_shepherd-2576 -bg
>>    21170 /opt/sge/utilbin/sol-amd64/rshd -l
>>      21171 /opt/sge/utilbin/sol-amd64/qrsh_starter 
>> /opt/sge/Huit/spool/Sara06/active_jobs/
>>        21172 tcsh -c /opt/lib/mpich2/bin/smpd -port 22576 -d 0
>>          21173 /opt/lib/mpich2/bin/smpd -port 22576 -d 0
>>            21174 /opt/lib/mpich2/bin/smpd -port 22576 -d 0
>>              21175 /home/jobic/sge/exemple/./hello
>>
>> However, when the job is finished, there's still the smpd running :
>> Sara06-jobic% ptree 21171
>> 455   /opt/sge/bin/sol-amd64/sge_execd
>>  21169 sge_shepherd-2576 -bg
>>    21170 /opt/sge/utilbin/sol-amd64/rshd -l
>>      21171 /opt/sge/utilbin/sol-amd64/qrsh_starter 
>> /opt/sge/Huit/spool/Sara06/active_jobs/
>>        21172 tcsh -c /opt/lib/mpich2/bin/smpd -port 22576 -d 0
>>          21173 /opt/lib/mpich2/bin/smpd -port 22576 -d 0
>>
>> With a qdel, i can delete them :
>>  2576  4 mpich2_test               jobic      09/03/2008 13:38:00 
>> Huit       (stalled)
>> It's just taking some time.
>
> you mean, the jobscript is in some way halted? If it finishes, it 
> should call the defined stop_proc_args of the PE to shut down the 
> daemons. You defined the stop-proc-args also in the outlined way?
>
> -- Reuti
>
Thanks for the fast answer.

I defined this for the pe :
homard-jobic% qconf -sp mpich2_smpd
pe_name           mpich2_smpd
slots             56
user_lists        NONE
xuser_lists       NONE
start_proc_args   /opt/sge/mpich2_smpd/startmpich2.sh -catch_rsh 
$pe_hostfile \
                  /opt/lib/mpich2
stop_proc_args    /opt/sge/mpich2_smpd/stopmpich2.sh -catch_rsh 
/opt/lib/mpich2
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

I found in the error file the line :
/opt/sge/mpich2_smpd/stopmpich2.sh: line 126: tac: command not found

It should come from here. How can i fix it ?

Many thanks,

Yann


-- 
___________________________

Yann JOBIC
HPC engineer
Polytech Marseille DME
IUSTI-CNRS UMR 6595
Technopole de Chateau Gombert
5 rue Enrico Fermi
13453 Marseille cedex 13
Tel : (33) 4 91 10 69 39
  ou  (33) 4 91 10 69 43
Fax : (33) 4 91 10 69 69 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list