[GE users] Wildcards in PE still broken in 6.0u3

Tim Mueller tim_mueller at hotmail.com
Fri Apr 29 17:11:27 BST 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Sorry, the last line in the output of qconf should be:

     61 0.55500 Job7    user        qw    04/29/2005 11:19:54 
8
       Full jobname:     Job7
       Requested PE:     *.mpi 8
       Hard Resources:
       Soft Resources:

(the job is in state "qw", not "r")

I was erasing user names and got a little carried away.

Tim

----- Original Message ----- 
From: "Tim Mueller" <tim_mueller at hotmail.com>
To: <users at gridengine.sunsource.net>
Sent: Friday, April 29, 2005 12:03 PM
Subject: Re: [GE users] Wildcards in PE still broken in 6.0u3


> Hi,
>
> I get:
>
>     59 0.55500 Job1    user        r     04/29/2005 10:47:08 
> mymachine.q.0 at local0                     8
>       Full jobname:     Job1
>       Master queue:     mymachine.q.0 at local0
>       Requested PE:     *.mpi 8
>       Granted PE:       mymachine.3.mpi 8
>       Hard Resources:
>       Soft Resources:
>     47 0.55500 Job2    user        r     04/27/2005 14:45:04 
> mymachine.q.0 at local6                     8
>       Full jobname:     Job2
>       Master queue:     mymachine.q.0 at local6
>       Requested PE:     *.mpi 8
>       Granted PE:       mymachine.3.mpi 8
>       Hard Resources:
>       Soft Resources:
>     44 0.55500 Job3    user        r     04/27/2005 11:55:49 
> mymachineq.1 at local12                    8
>       Full jobname:     Job3
>       Master queue:     mymachine.q.1 at local12
>       Requested PE:     *.mpi 8
>       Granted PE:       mymachine.3.mpi 8
>       Hard Resources:
>       Soft Resources:
>     60 0.55500 Job4    user        r     04/29/2005 10:55:53 
> mymachine.q.1 at local9                     8
>       Full jobname:     Job4
>       Master queue:     mymachine.q.1 at local9
>       Requested PE:     *.mpi 8
>       Granted PE:       mymachine.3.mpi 8
>       Hard Resources:
>       Soft Resources:
>     49 0.55500 Job5    user        r     04/27/2005 15:01:53 
> mymachine.q.2 at local16                    8
>       Full jobname:     Job5
>       Master queue:     mymachine.q.2 at local16
>       Requested PE:     *.mpi 8
>       Granted PE:       mymachine.3.mpi 8
>       Hard Resources:
>       Soft Resources:
>     48 0.55500 Job6    user        r     04/27/2005 14:57:53 
> mymachine.q.2 at local20                    8
>       Full jobname:     Job6
>       Master queue:     mymachine.q.2 at local20
>       Requested PE:     *.mpi 8
>       Granted PE:       mymachine.3.mpi 8
>       Hard Resources:
>       Soft Resources:
>     61 0.55500 Job7    user        r    04/29/2005 11:19:54 8
>       Full jobname:     Job7
>       Requested PE:     *.mpi 8
>       Hard Resources:
>       Soft Resources:
>
> When I do qconf -sp mymachine.3.mpi, I get:
>
> pe_name           mymachine.3.mpi
> slots             16
> user_lists        NONE
> xuser_lists       NONE
> start_proc_args   /bin/true
> stop_proc_args    /opt/lam/intel/bin/sge-lamhalt
> allocation_rule   $round_robin
> control_slaves    TRUE
> job_is_first_task FALSE
> urgency_slots     avg
>
> When I do qconf -sq mymachine.q.0, I get
>
> qname                 mymachine.q.0
> hostlist              @mymachine-0
> seq_no                0
> load_thresholds       NONE
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               mymachine.0.mpi
> rerun                 FALSE
> slots                 2
> tmpdir                /tmp
> shell                 /bin/bash
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            sgeadmin
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  84:00:00
> h_rt                  84:15:00
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 1G
> h_rss                 1G
> s_vmem                INFINITY
> h_vmem                INFINITY
>
> And so on, up to mymachine.q.3.
>
> Tim
>
> ----- Original Message ----- 
> From: "Reuti" <reuti at staff.uni-marburg.de>
> To: <users at gridengine.sunsource.net>
> Sent: Friday, April 29, 2005 11:14 AM
> Subject: Re: [GE users] Wildcards in PE still broken in 6.0u3
>
>
>> Hi Tim,
>>
>> what is:
>>
>> qstat -r
>>
>> showing as granted PEs? - Reuti
>>
>>
>> Quoting Tim Mueller <tim_mueller at hotmail.com>:
>>
>>> Hi,
>>>
>>> That's the problem.  The setup is actually
>>>
>>> mymachine.q.0 references mymachine.0.mpi
>>> mymachine.q.1 references mymachine.1.mpi
>>> mymachine.q.2 references mymachine.2.mpi
>>> mymachine.q.3 references mymachine.3.mpi
>>>
>>> There is no reason, as far as I can tell, that a job could ever be in 
>>> both
>>> mymachine.3.mpi and mymachine.q.1.  And oddly enough, when I use 
>>> wildcards,
>>>
>>> the the scheduler won't put a job assigned to mymachine.3.mpi into
>>> mymachine.q.3 until all of the other queues are full.  At that point, 
>>> it's
>>> too late because mymachine.3.mpi is using 48 slots, when it's only 
>>> allowed
>>> to use up to 16.
>>>
>>> When I don't use wildcards, I get the behavior I expect:  A job 
>>> submitted to
>>>
>>> mymachine.3.mpi gets put in mymachine.q.3, etc.
>>>
>>> Tim
>>>
>>> ----- Original Message ----- 
>>> From: "Stephan Grell - Sun Germany - SSG - Software Engineer"
>>> <stephan.grell at sun.com>
>>> To: <users at gridengine.sunsource.net>
>>> Sent: Friday, April 29, 2005 2:34 AM
>>> Subject: Re: [GE users] Wildcards in PE still broken in 6.0u3
>>>
>>>
>>> > Hi Tim,
>>> >
>>> > I am not quite sure I understand your setup. Could you please attach 
>>> > your
>>>
>>> > cqueue configuration? From
>>> > the results you posted, it reads as if:
>>> > queue
>>> > mymachine.q.0  references mymachine.3.mpi
>>> > mymachine.q.1  reference mymachine.3.mpi
>>> >
>>> > and so on.
>>> >
>>> > Cheers,
>>> > Stephan
>>> >
>>> > Tim Mueller wrote:
>>> >
>>> >> Hi,
>>> >>  It appears that wildcards in the Parallel Environment name still 
>>> >> have
>>> >> problems in 6.0u3.  I have set up a linux cluster of 32 dual 
>>> >> processor
>>> >> Noconas running Linux.  There are 4 queues of 16 processors each, and 
>>> >> a
>>> >> corresponding pe for each queue.  The queues are named as follows:
>>> >>  mymachine.q.0
>>> >> mymachine.q.1
>>> >> mymachine.q.2
>>> >> mymachine.q.3
>>> >>  And the PE's are
>>> >>  mymachine.0.mpi
>>> >> mymachine.1.mpi
>>> >> mymachine.2.mpi
>>> >> mymachine.3.mpi
>>> >>  All of the PE's have 16 slots.  When I submit a job with the 
>>> >> following
>>> >> line:
>>> >>  #$ -pe *.mpi 8
>>> >>  the job will be assigned to a seemingly random PE, but then placed 
>>> >> in a
>>>
>>> >> queue that does not correspond to that PE.  I can submit up to 6 jobs
>>> >> this way, each of which will get assigned to the same PE and placed 
>>> >> in
>>> >> any queue that does not correspond to the PE.  This causes 48 
>>> >> processors
>>>
>>> >> to be used for a PE with only 16 slots.  E.g., I might get:
>>> >>  Job 1        mymachine.3.mpi        mymachine.q.0        8 
>>> >> processors
>>> >> Job 2        mymachine.3.mpi        mymachine.q.0        8 processors
>>> >> Job 3        mymachine.3.mpi        mymachine.q.1        8 processors
>>> >> Job 4        mymachine.3.mpi        mymachine.q.1        8 processors
>>> >> Job 5        mymachine.3.mpi        mymachine.q.2        8 processors
>>> >> Job 6        mymachine.3.mpi        mymachine.q.2        8 processors
>>> >> Job 7        qw
>>> >> Job 8        qw
>>> >>  When I should get:
>>> >>  Job 1        mymachine.0.mpi        mymachine.q.0        8 
>>> >> processors
>>> >> Job 2        mymachine.0.mpi        mymachine.q.0        8 processors
>>> >> Job 3        mymachine.1.mpi        mymachine.q.1        8 processors
>>> >> Job 4        mymachine.1.mpi        mymachine.q.1        8 processors
>>> >> Job 5        mymachine.2.mpi        mymachine.q.2        8 processors
>>> >> Job 6        mymachine.2.mpi        mymachine.q.2        8 processors
>>> >> Job 5        mymachine.3.mpi        mymachine.q.3        8 processors
>>> >> Job 6        mymachine.3.mpi        mymachine.q.3        8 processors
>>> >>  If I try to then submit a job directly (with no wildcard) to the PE 
>>> >> that
>>>
>>> >> all of the jobs were assigned to, it will not run because I have 
>>> >> already
>>>
>>> >> far exceeded the slots limit for this PE.
>>> >>  I should note that when I do not use wildcards, everything behaves 
>>> >> as it
>>>
>>> >> should.  E.g, a job submitted to mymachine.2.mpi will be assigned to
>>> >> mymachine.2.mpi and mymachine.2.q, and I cannot use more than 16 
>>> >> slots in
>>>
>>> >> mymachine.2.mpi at once.
>>> >>  I searched the list, and although there seem to have been other 
>>> >> problems
>>>
>>> >> with wildcards in the past, I have seen nothing that references this
>>> >> behavior.  Does anyone have an explanation / workaround?
>>> >>  Tim
>>> >
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list