[GE users] Startup times and other issues with 6.0u3

Brian R Smith brian at cypher.acomp.usf.edu
Sat Mar 19 15:09:30 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti:

shell                 /bin/csh
shell_start_mode      posix_compliant

Thats how we run it on our other machines and it seems to work just fine.  Figured I didn't have to change anything.  

The PE mpich-vasp has all traces of tight integration removed.  I was using it for testing Vasp.  However, I got vasp to work with tight-integration so that queue is depricated.  

As for MM5, it's compiled with the PGI compilers.  That said, it runs beautifully outside of SGE on the same system (mpirun).  It is only under SGE that the processes come to a crawl.  If you want more info on it, let me know.  I just don't see how that would help.

my mpirun command for that is simply

$MPIR_PATH/mpirun -np $NSLOTS -machinefile $TMPDIR/machines mm5.mpp

mm5.mpp does some small writes to the NFS mount that it runs out of.  During the span of about 3 minutes, it will dump approximately 10MB of data into its current working directory.  I have load tested this outside of SGE and found that the NFS writes are not the cause of any slowdown.

MPICH runs over a dedicated GigE connection that's devoted to 1) SGE 2) Message Passing 3) PVFS2.  The other network (100mb) handles NFS, NIS, etc.  The GigE network is a Cisco Catalyst series GigE switch with Intel GigE controllers on the nodes.  We are not using Jumbo frames on that network either as we've yet to get any testing done on the benefits of doing so.

Hope this helps.


Brian



Reuti wrote:

>Hi folks,
>
>just got up and there is always an additional little delay, getting and reading 
>all the posts from last night. Okay, I caught up.
>
>One thing I see:
>
>  
>
>>shell                 /bin/csh
>>shell_start_mode      posix_compliant
>>    
>>
>
>this is okay for you and the used scripts (most of the time unix_behavior is 
>preferred)? What is different for your PE in mpich-vasp?
>
>The MM5 is this: http://www.mmm.ucar.edu/mm5/? Which version of MPICH are you 
>using, and how did you compiled it (./configure ...???....). What is your 
>mpirun command/script for submitting the job?
>
>In the archive of their website I found this:
>
>http://mailman.ucar.edu/pipermail/mm5-users/2004/000477.html
>
>Seems that MM5-MPP in generating much network traffic. What type of 
>network/switch do you have? With 6.0u2 there was no slow down of your jobs?
>
>Cheers - Reuti
>
>
>Quoting Brian R Smith <brian at cypher.acomp.usf.edu>:
>
>  
>
>>Ron,
>>
>>First, sorry for getting so excited... this problem has been bugging me 
>>all day.  I am having some problems with MM5 with regards to deleting 
>>the processes since shutting of "control slaves".  Most of my other MPI 
>>jobs are running much better.  So, shutting off control_slaves disables 
>>tight integration?
>>
>>To answer your questions:
>>
>>1) I have the precompiled binaries.  That is what I have always used on 
>>all of our other clusters.
>>2) Here are my settings:
>>
>>Scheduler:
>>
>>algorithm                         default
>>schedule_interval                 0:0:10
>>maxujobs                          0
>>queue_sort_method                 seqno
>>job_load_adjustments              np_load_avg=0.50
>>load_adjustment_decay_time        0:7:30
>>load_formula                      np_load_avg
>>schedd_job_info                   true
>>flush_submit_sec                  0
>>flush_finish_sec                  0
>>params                            none
>>reprioritize_interval             0:0:0
>>halftime                          168
>>usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
>>compensation_factor               5.000000
>>weight_user                       0.250000
>>weight_project                    0.250000
>>weight_department                 0.250000
>>weight_job                        0.250000
>>weight_tickets_functional         0
>>weight_tickets_share              0
>>share_override_tickets            TRUE
>>share_functional_shares           TRUE
>>max_functional_jobs_to_schedule   200
>>report_pjob_tickets               TRUE
>>max_pending_tasks_per_job         50
>>halflife_decay_list               none
>>policy_hierarchy                  OFS
>>weight_ticket                     0.010000
>>weight_waiting_time               0.000000
>>weight_deadline                   3600000.000000
>>weight_urgency                    0.100000
>>weight_priority                   1.000000
>>max_reservation                   0
>>default_duration                  0:10:0
>>
>>Main queue:
>>
>>qname                 all.q
>>hostlist              gbn001 gbn002 gbn003 gbn004 gbn005 gbn006 gbn007 
>>gbn008 \
>>                      gbn009 gbn010 gbn011 gbn012 gbn013 gbn014 gbn015 
>>gbn016 \
>>                      gbn017 gbn018 gbn019 gbn020 gbn021 gbn022 gbn023 
>>gbn024 \
>>                      gbn025 gbn026 gbn027 gbn028 gbn029 gbn030 gbn031 
>>gbn032 \
>>                      gbn033 gbn034 gbn035 gbn036 gbn037 gbn038 gbn039 
>>gbn040 \
>>                      gbn041 gbn042
>>seq_no                0
>>load_thresholds       NONE
>>suspend_thresholds    NONE
>>nsuspend              1
>>suspend_interval      00:05:00
>>priority              0
>>min_cpu_interval      00:05:00
>>processors            1
>>qtype                 BATCH INTERACTIVE
>>ckpt_list             NONE
>>pe_list               make mpich mpich-vasp
>>rerun                 FALSE
>>slots                 1
>>tmpdir                /tmp
>>shell                 /bin/csh
>>prolog                NONE
>>epilog                NONE
>>shell_start_mode      posix_compliant
>>starter_method        NONE
>>suspend_method        NONE
>>resume_method         NONE
>>terminate_method      NONE
>>notify                00:00:60
>>owner_list            NONE
>>user_lists            NONE
>>xuser_lists           NONE
>>subordinate_list      NONE
>>complex_values        NONE
>>projects              NONE
>>xprojects             NONE
>>calendar              NONE
>>initial_state         enabled
>>s_rt                  INFINITY
>>h_rt                  INFINITY
>>s_cpu                 INFINITY
>>h_cpu                 INFINITY
>>s_fsize               INFINITY
>>h_fsize               INFINITY
>>s_data                INFINITY
>>h_data                INFINITY
>>s_stack               INFINITY
>>h_stack               10240K
>>s_core                INFINITY
>>h_core                INFINITY
>>s_rss                 INFINITY
>>h_rss                 INFINITY
>>s_vmem                INFINITY
>>h_vmem                INFINITY
>>
>>Parallel Environment
>>
>>pe_name           mpich
>>slots             44
>>user_lists        NONE
>>xuser_lists       NONE
>>start_proc_args   /usr/local/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
>>stop_proc_args    /usr/local/sge/mpi/stopmpi.sh
>>allocation_rule   $round_robin
>>control_slaves    FALSE
>>job_is_first_task FALSE
>>urgency_slots     min
>>
>>3) As for the processes, on the primary execution node, I see 
>>sge_shepherd-96 -bg, my job script, the mpirun command and the slew of 
>>rsh calls that go with it.  On all other slave nodes, I see only in.rshd 
>>and two copies of the mpi binary that I originally started with mpirun.
>>
>>Hope this helps.
>>
>>
>>Brian
>>
>>Ron Chen wrote:
>>
>>    
>>
>>>--- Brian R Smith <brian at cypher.acomp.usf.edu> wrote: 
>>> 
>>>
>>>      
>>>
>>>>You are absolutely the man.  Setting "control
>>>>slaves" to false fixed all of my problems.
>>>>   
>>>>
>>>>        
>>>>
>>>No, it is not fixing anything!
>>>
>>>"control slaves" means non-tight integration, so you
>>>won't get process control/accounting of the slaves MPI
>>>tasks.
>>>
>>>In SGE 6 update 4, the slow start problem was fixed.
>>>But the original problem was that starting a 400-node
>>>parallel job with tight integration takes several tens
>>>seconds or something. But for your case it takes 10
>>>minutes! So there is still something going on with
>>>your configuration.
>>>
>>>Did you get the precomplied binaries or compile from
>>>source? Also, are you using the default settings or
>>>you have already played around with the settings a
>>>bit?
>>>
>>>Also, logon to the nodes and see what processes are
>>>running when a parallel job starts.
>>>
>>>-Ron
>>>
>>>
>>>
>>>
>>>		
>>>__________________________________ 
>>>Do you Yahoo!? 
>>>Yahoo! Small Business - Try our new resources site!
>>>http://smallbusiness.yahoo.com/resources/ 
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>> 
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>    
>>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list