[GE users] Wildcards in PE still broken in 6.0u3

Tim Mueller tim_mueller at hotmail.com
Fri Apr 29 17:03:57 BST 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

I get:

     59 0.55500 Job1    user        r     04/29/2005 10:47:08 
mymachine.q.0 at local0                     8
       Full jobname:     Job1
       Master queue:     mymachine.q.0 at local0
       Requested PE:     *.mpi 8
       Granted PE:       mymachine.3.mpi 8
       Hard Resources:
       Soft Resources:
     47 0.55500 Job2    user        r     04/27/2005 14:45:04 
mymachine.q.0 at local6                     8
       Full jobname:     Job2
       Master queue:     mymachine.q.0 at local6
       Requested PE:     *.mpi 8
       Granted PE:       mymachine.3.mpi 8
       Hard Resources:
       Soft Resources:
     44 0.55500 Job3    user        r     04/27/2005 11:55:49 
mymachineq.1 at local12                    8
       Full jobname:     Job3
       Master queue:     mymachine.q.1 at local12
       Requested PE:     *.mpi 8
       Granted PE:       mymachine.3.mpi 8
       Hard Resources:
       Soft Resources:
     60 0.55500 Job4    user        r     04/29/2005 10:55:53 
mymachine.q.1 at local9                     8
       Full jobname:     Job4
       Master queue:     mymachine.q.1 at local9
       Requested PE:     *.mpi 8
       Granted PE:       mymachine.3.mpi 8
       Hard Resources:
       Soft Resources:
     49 0.55500 Job5    user        r     04/27/2005 15:01:53 
mymachine.q.2 at local16                    8
       Full jobname:     Job5
       Master queue:     mymachine.q.2 at local16
       Requested PE:     *.mpi 8
       Granted PE:       mymachine.3.mpi 8
       Hard Resources:
       Soft Resources:
     48 0.55500 Job6    user        r     04/27/2005 14:57:53 
mymachine.q.2 at local20                    8
       Full jobname:     Job6
       Master queue:     mymachine.q.2 at local20
       Requested PE:     *.mpi 8
       Granted PE:       mymachine.3.mpi 8
       Hard Resources:
       Soft Resources:
     61 0.55500 Job7    user        r    04/29/2005 11:19:54 
8
       Full jobname:     Job7
       Requested PE:     *.mpi 8
       Hard Resources:
       Soft Resources:

When I do qconf -sp mymachine.3.mpi, I get:

pe_name           mymachine.3.mpi
slots             16
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /opt/lam/intel/bin/sge-lamhalt
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     avg

When I do qconf -sq mymachine.q.0, I get

qname                 mymachine.q.0
hostlist              @mymachine-0
seq_no                0
load_thresholds       NONE
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               mymachine.0.mpi
rerun                 FALSE
slots                 2
tmpdir                /tmp
shell                 /bin/bash
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            sgeadmin
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  84:00:00
h_rt                  84:15:00
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 1G
h_rss                 1G
s_vmem                INFINITY
h_vmem                INFINITY

And so on, up to mymachine.q.3.

Tim

----- Original Message ----- 
From: "Reuti" <reuti at staff.uni-marburg.de>
To: <users at gridengine.sunsource.net>
Sent: Friday, April 29, 2005 11:14 AM
Subject: Re: [GE users] Wildcards in PE still broken in 6.0u3


> Hi Tim,
>
> what is:
>
> qstat -r
>
> showing as granted PEs? - Reuti
>
>
> Quoting Tim Mueller <tim_mueller at hotmail.com>:
>
>> Hi,
>>
>> That's the problem.  The setup is actually
>>
>> mymachine.q.0 references mymachine.0.mpi
>> mymachine.q.1 references mymachine.1.mpi
>> mymachine.q.2 references mymachine.2.mpi
>> mymachine.q.3 references mymachine.3.mpi
>>
>> There is no reason, as far as I can tell, that a job could ever be in 
>> both
>> mymachine.3.mpi and mymachine.q.1.  And oddly enough, when I use 
>> wildcards,
>>
>> the the scheduler won't put a job assigned to mymachine.3.mpi into
>> mymachine.q.3 until all of the other queues are full.  At that point, 
>> it's
>> too late because mymachine.3.mpi is using 48 slots, when it's only 
>> allowed
>> to use up to 16.
>>
>> When I don't use wildcards, I get the behavior I expect:  A job submitted 
>> to
>>
>> mymachine.3.mpi gets put in mymachine.q.3, etc.
>>
>> Tim
>>
>> ----- Original Message ----- 
>> From: "Stephan Grell - Sun Germany - SSG - Software Engineer"
>> <stephan.grell at sun.com>
>> To: <users at gridengine.sunsource.net>
>> Sent: Friday, April 29, 2005 2:34 AM
>> Subject: Re: [GE users] Wildcards in PE still broken in 6.0u3
>>
>>
>> > Hi Tim,
>> >
>> > I am not quite sure I understand your setup. Could you please attach 
>> > your
>>
>> > cqueue configuration? From
>> > the results you posted, it reads as if:
>> > queue
>> > mymachine.q.0  references mymachine.3.mpi
>> > mymachine.q.1  reference mymachine.3.mpi
>> >
>> > and so on.
>> >
>> > Cheers,
>> > Stephan
>> >
>> > Tim Mueller wrote:
>> >
>> >> Hi,
>> >>  It appears that wildcards in the Parallel Environment name still have
>> >> problems in 6.0u3.  I have set up a linux cluster of 32 dual processor
>> >> Noconas running Linux.  There are 4 queues of 16 processors each, and 
>> >> a
>> >> corresponding pe for each queue.  The queues are named as follows:
>> >>  mymachine.q.0
>> >> mymachine.q.1
>> >> mymachine.q.2
>> >> mymachine.q.3
>> >>  And the PE's are
>> >>  mymachine.0.mpi
>> >> mymachine.1.mpi
>> >> mymachine.2.mpi
>> >> mymachine.3.mpi
>> >>  All of the PE's have 16 slots.  When I submit a job with the 
>> >> following
>> >> line:
>> >>  #$ -pe *.mpi 8
>> >>  the job will be assigned to a seemingly random PE, but then placed in 
>> >> a
>>
>> >> queue that does not correspond to that PE.  I can submit up to 6 jobs
>> >> this way, each of which will get assigned to the same PE and placed in
>> >> any queue that does not correspond to the PE.  This causes 48 
>> >> processors
>>
>> >> to be used for a PE with only 16 slots.  E.g., I might get:
>> >>  Job 1        mymachine.3.mpi        mymachine.q.0        8 processors
>> >> Job 2        mymachine.3.mpi        mymachine.q.0        8 processors
>> >> Job 3        mymachine.3.mpi        mymachine.q.1        8 processors
>> >> Job 4        mymachine.3.mpi        mymachine.q.1        8 processors
>> >> Job 5        mymachine.3.mpi        mymachine.q.2        8 processors
>> >> Job 6        mymachine.3.mpi        mymachine.q.2        8 processors
>> >> Job 7        qw
>> >> Job 8        qw
>> >>  When I should get:
>> >>  Job 1        mymachine.0.mpi        mymachine.q.0        8 processors
>> >> Job 2        mymachine.0.mpi        mymachine.q.0        8 processors
>> >> Job 3        mymachine.1.mpi        mymachine.q.1        8 processors
>> >> Job 4        mymachine.1.mpi        mymachine.q.1        8 processors
>> >> Job 5        mymachine.2.mpi        mymachine.q.2        8 processors
>> >> Job 6        mymachine.2.mpi        mymachine.q.2        8 processors
>> >> Job 5        mymachine.3.mpi        mymachine.q.3        8 processors
>> >> Job 6        mymachine.3.mpi        mymachine.q.3        8 processors
>> >>  If I try to then submit a job directly (with no wildcard) to the PE 
>> >> that
>>
>> >> all of the jobs were assigned to, it will not run because I have 
>> >> already
>>
>> >> far exceeded the slots limit for this PE.
>> >>  I should note that when I do not use wildcards, everything behaves as 
>> >> it
>>
>> >> should.  E.g, a job submitted to mymachine.2.mpi will be assigned to
>> >> mymachine.2.mpi and mymachine.2.q, and I cannot use more than 16 slots 
>> >> in
>>
>> >> mymachine.2.mpi at once.
>> >>  I searched the list, and although there seem to have been other 
>> >> problems
>>
>> >> with wildcards in the past, I have seen nothing that references this
>> >> behavior.  Does anyone have an explanation / workaround?
>> >>  Tim
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list