[GE users] mpich <-> sge --> controlling hosts machinefile

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Thu Jul 5 11:37:08 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Gerolf,

On Thu, 5 Jul 2007, Gerolf Ziegenhain wrote:

> Is n_slots=2 the same as slots=2?

Not quite, but almost. Slots get implicitely requested by all jobs that 
go through SGE, whereas n_slots requires certain effort to ensure jobs 
comply with this convention. Thus slots should always be prefered over
n_slots for the sake of having SGE configuration manageable.

Requesting a shadow slot resouce with only a subset of the entire workload 
can be an approach to deal with cases where different kinds of differently 
priviledged jobs classes shall somehow fight for nodes. As long as one 
dosn't care for such cases the shadow slots burdon can not be recommended, 
since it really is just one approach.

Regards,
Andreas

> /BR:
>  Gerolf
>
> 2007/7/5, Andreas.Haas at sun.com <Andreas.Haas at sun.com>:
>> 
>> Hi Gerolf,
>> 
>> you should really use 2 as allocation_rule, since you always want to have
>> 2
>> tasks on a node.
>> 
>> For preventing overloaded nodes be selected load thresholds although can
>> be used,
>> yet this requires you to fiddle around with time dependent things like
>> load
>> correction. If this load of 1 comes from other batch load that
>> is known by Grid Engine you should consider use of
>>
>>     complex_values slots=2
>> 
>> in each execution host configuration or, if you run 6.1 you could also
>> set-up
>> a resource quota such as
>>
>>     limit hosts {@mpich_hosts} to slots=2
>> 
>> the benefit of deploying the slot consumable instead of load threshold is
>> that your configuration becomes more deterministic.
>> 
>> Regards,
>> Andreas
>> 
>> 
>> On Wed, 4 Jul 2007, Gerolf Ziegenhain wrote:
>> 
>> > Hi Chris,
>> >
>> > The result of this experiment is predictable: If I remove the n_slots
>> > threshold, the SGE will also start single slots on nodes which already
>> have
>> > a load of 1. This means it can only start 1 process; however if the
>> > allocation of slots is fixed with "2" nothing will happen: behaviour as
>> > described before.
>> >
>> > /BR: Gerolf
>> >
>> > 2007/7/4, Chris Dagdigian <dag at sonsorol.org>:
>> >>
>> >>
>> >> Hi Gerolf,
>> >>
>> >> I run MPI apps on 2-way compute nodes all the time without problems
>> >> and I've always just let "slots 2" stay in the queue configuration.
>> >> The SGE scheduler always did the right thing. Can you remove your
>> >> n_slots load thresholds and queue consumables and see what happens?
>> >> I'm still curious as to why you had three tasks land on one node.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Jul 4, 2007, at 3:35 PM, Gerolf Ziegenhain wrote:
>> >>
>> >> > Thanks for the very quick reply ;)
>> >> >
>> >> > allocation_rule = $round_robin results in 1job/node. This increases
>> >> > the communication effort. So maybe allocation_rule=2 would be the
>> >> > best choice in my case?
>> >> >
>> >> > This is the configuration of the queue:
>> >> > qconf -sq q_mpich
>> >> > qname                 q_mpich
>> >> > hostlist              lc10 lc11 lc12 lc13 lc14 lc15 lc18 lc19
>> >> > seq_no                21,[@b_hosts=22],[@x_hosts=23]
>> >> > load_thresholds       np_load_avg=1,np_load_short=1,n_slots=2, \
>> >> >
>> >> > [@b_hosts=np_load_avg=1,np_load_short=1,n_slots=2], \
>> >> >
>> >> > [@x_hosts=np_load_avg=1,np_load_short=1,n_slots=2]
>> >> > suspend_thresholds    NONE
>> >> > nsuspend              1
>> >> > suspend_interval      00:05:00
>> >> > priority              0
>> >> > min_cpu_interval      00:05:00
>> >> > processors            UNDEFINED
>> >> > qtype                 BATCH
>> >> > ckpt_list             NONE
>> >> > pe_list               mpich
>> >> > rerun                 TRUE
>> >> > slots                 2
>> >> > tmpdir                /tmp
>> >> > shell                 /bin/bash
>> >> > prolog                NONE
>> >> > epilog                NONE
>> >> > shell_start_mode      unix_behavior
>> >> > starter_method        NONE
>> >> > suspend_method        NONE
>> >> > resume_method         NONE
>> >> > terminate_method      NONE
>> >> > notify                00:00:60
>> >> > owner_list            NONE
>> >> > user_lists            ziegen,[@x_hosts=big]
>> >> > xuser_lists           matlab matlab1 thor
>> >> > subordinate_list      NONE
>> >> > complex_values        synchron=0,virtual_free=3G,n_slots=2, \
>> >> >
>> >> > [@b_hosts=synchron=0,virtual_free=5G,n_slots=2], \
>> >> >
>> [@x_hosts=synchron=0,virtual_free=17G,n_slots=2]
>> >> > projects              NONE
>> >> > xprojects             NONE
>> >> > calendar              NONE
>> >> > initial_state         default
>> >> > s_rt                  INFINITY
>> >> > h_rt                  INFINITY
>> >> > s_cpu                 INFINITY
>> >> > h_cpu                 100:00:00
>> >> > s_fsize               INFINITY
>> >> > h_fsize               INFINITY
>> >> > s_data                INFINITY
>> >> > h_data                2G,[@b_hosts=4G],[@x_hosts=16G]
>> >> > s_stack               INFINITY
>> >> > h_stack               INFINITY
>> >> > s_core                INFINITY
>> >> > h_core                INFINITY
>> >> > s_rss                 INFINITY
>> >> > h_rss                 INFINITY
>> >> > s_vmem                INFINITY
>> >> > h_vmem                3G,[@b_hosts=5G],[@x_hosts=17G]
>> >> >
>> >> >
>> >> > /BR:
>> >> >    Gerolf
>> >> >
>> >> >
>> >> > 2007/7/4, Chris Dagdigian <dag at sonsorol.org>:
>> >> > Not sure if this totally answers your question but you can play with
>> >> > the host selection process by adjusting your $allocation_rule in your
>> >> > parallel environment configuration.
>> >> >
>> >> > For instance, you have $fill_up configured which is why your parallel
>> >> > slots are being packed on as few nodes as possible. Changing to
>> >> > $round_robin will spread it out among as many machines as possible.
>> >> >
>> >> > For your main symptom:
>> >> >
>> >> > If your parallel jobs are running more than 2 tasks per node then
>> >> > something may be off with your slot count - perhaps SGE is detecting
>> >> > multi-core CPUs on your 2-way boxes and setting slots=4 on each node.
>> >> > Posting the config of the queue "mpich-qeueue" may help get to the
>> >> > bottom of this as I'm not sure about the n_slots "limit" you are
>> >> > referring to.
>> >> >
>> >> >
>> >> > Regards,
>> >> > Chris
>> >> >
>> >> >
>> >> >
>> >> > On Jul 4, 2007, at 3:14 PM, Gerolf Ziegenhain wrote:
>> >> >
>> >> > > Hi,
>> >> > >
>> >> > > Maybe it is a very stupid question, but: How do I control the
>> >> > > number of jobs per node? Consider the following hardware: 38 nodes
>> >> > > with two processors on each. When I start a job with -pe mpich 8
>> >> > > there should be 4 nodes used with 2 jobs on each. What do I have to
>> >> > > do in order to achieve this?
>> >> > >
>> >> > > My parallel environment is configured like this:
>> >> > > qconf -sp mpich
>> >> > > pe_name           mpich
>> >> > > slots             60
>> >> > > user_lists        NONE
>> >> > > xuser_lists       NONE
>> >> > > start_proc_args   /opt/N1GE/mpi/startmpi.sh -catch_rsh $pe_hostfile
>> >> > > stop_proc_args    /opt/N1GE/mpi/stopmpi.sh
>> >> > > allocation_rule   $fill_up
>> >> > > control_slaves    TRUE
>> >> > > job_is_first_task FALSE
>> >> > > urgency_slots     min
>> >> > >
>> >> > > My mpich-queue has limits:
>> >> > > np_load_av=1
>> >> > > np_load_sh=1
>> >> > > n_slots=2
>> >> > >
>> >> > > However if I start a job, something like this will happen in the
>> >> > > PI1234-file:
>> >> > > lc12.rhrk.uni-kl.de 0 prog
>> >> > > lc19 1 prog
>> >> > > lc19 1 prog
>> >> > > lc19 1 prog
>> >> > > lc14 1 prog
>> >> > > lc14 1 prog
>> >> > > lc13 1 prog
>> >> > > lc13 1 prog
>> >> > >
>> >> > > So there are particularly three jobs on lc19 with only two CPUs, On
>> >> > > of these three jobs would better be running on lc12. How can I fix
>> >> > > this?
>> >> > >
>> >> > >
>> >> > > Thanks in advance:
>> >> > >    Gerolf
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Dipl. Phys. Gerolf Ziegenhain
>> >> > > Office: Room 46-332 - Erwin-Schrödinger-Str.46 - TU Kaiserslautern
>> >> > > - Germany
>> >> > > Web: gerolf.ziegenhain.com
>> >> > >
>> >> > >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Dipl. Phys. Gerolf Ziegenhain
>> >> > Office: Room 46-332 - Erwin-Schrödinger-Str.46 - TU Kaiserslautern
>> >> > - Germany
>> >> > Web: gerolf.ziegenhain.com
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >>
>> >>
>> >
>> >
>> > --
>> > Dipl. Phys. Gerolf Ziegenhain
>> > Office: Room 46-332 - Erwin-Schrödinger-Str.46 - TU Kaiserslautern -
>> Germany
>> > Web: gerolf.ziegenhain.com
>> >
>> 
>> http://gridengine.info/
>> 
>> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
>> Kirchheim-Heimstetten
>> Amtsgericht Muenchen: HRB 161028
>> Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
>> Vorsitzender des Aufsichtsrates: Martin Haering
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> 
>
>
>
> -- 
> Dipl. Phys. Gerolf Ziegenhain
> Office: Room 46-332 - Erwin-Schrödinger-Str.46 - TU Kaiserslautern - Germany
> Web: gerolf.ziegenhain.com
>

http://gridengine.info/

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering



    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list