[GE users] Startup times and other issues with 6.0u3
Brian R Smith
brian at cypher.acomp.usf.edu
Sat Mar 19 01:27:59 GMT 2005
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
First, sorry for getting so excited... this problem has been bugging me
all day. I am having some problems with MM5 with regards to deleting
the processes since shutting of "control slaves". Most of my other MPI
jobs are running much better. So, shutting off control_slaves disables
To answer your questions:
1) I have the precompiled binaries. That is what I have always used on
all of our other clusters.
2) Here are my settings:
hostlist gbn001 gbn002 gbn003 gbn004 gbn005 gbn006 gbn007
gbn009 gbn010 gbn011 gbn012 gbn013 gbn014 gbn015
gbn017 gbn018 gbn019 gbn020 gbn021 gbn022 gbn023
gbn025 gbn026 gbn027 gbn028 gbn029 gbn030 gbn031
gbn033 gbn034 gbn035 gbn036 gbn037 gbn038 gbn039
qtype BATCH INTERACTIVE
pe_list make mpich mpich-vasp
start_proc_args /usr/local/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
3) As for the processes, on the primary execution node, I see
sge_shepherd-96 -bg, my job script, the mpirun command and the slew of
rsh calls that go with it. On all other slave nodes, I see only in.rshd
and two copies of the mpi binary that I originally started with mpirun.
Hope this helps.
Ron Chen wrote:
>--- Brian R Smith <brian at cypher.acomp.usf.edu> wrote:
>>You are absolutely the man. Setting "control
>>slaves" to false fixed all of my problems.
>No, it is not fixing anything!
>"control slaves" means non-tight integration, so you
>won't get process control/accounting of the slaves MPI
>In SGE 6 update 4, the slow start problem was fixed.
>But the original problem was that starting a 400-node
>parallel job with tight integration takes several tens
>seconds or something. But for your case it takes 10
>minutes! So there is still something going on with
>Did you get the precomplied binaries or compile from
>source? Also, are you using the default settings or
>you have already played around with the settings a
>Also, logon to the nodes and see what processes are
>running when a parallel job starts.
>Do you Yahoo!?
>Yahoo! Small Business - Try our new resources site!
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users