[GE users] slots and more - what are they?

Reuti reuti at staff.uni-marburg.de
Sun Aug 5 14:29:33 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Am 01.08.2007 um 19:30 schrieb Alexandre Racine:

> It will start, but even if the queue have 1000 jobs, the scheduler  
> wont put more then 7 jobs on the server torque2 and no more then 4  
> jobs on the server torque3. But I have put 10 slots for each. The  
> torque2 server have 2 processors, and torque3 have 1 processor,  
> maybe this come into play for the scheduler?

this would make a difference of course, if there would be some load  
on the machines, as it's the average normalized load value (to one  
CPU) used in the load_thresholds.

Can you please try to set in the queue definition:

load_thresholds       NONE

and in the scheduler setup:

job_load_adjustments              NONE
load_adjustment_decay_time        0:0:0

-- Reuti

>
>
>
> Alexandre Racine
> Projets spéciaux
>
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Wed 2007-08-01 07:38
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] slots and more - what are they?
>
>
> Am 31.07.2007 um 16:37 schrieb Alexandre Racine:
>
>> Here is the results of those commands.
>>
>> $ /usr/local/sge/sge-root/bin/lx24-x86/qstat -f
>> queuename                      qtype used/tot. load_avg
>> arch          states
>> --------------------------------------------------------------------- 
>> -
>> ------
>> all.q at TORQUE1.statgen.local    BIP   0/1       0.00     lx24-x86
>> --------------------------------------------------------------------- 
>> -
>> ------
>> all.q at TORQUE2.statgen.local    BIP   0/10      0.00     lx24-x86
>> --------------------------------------------------------------------- 
>> -
>> ------
>> all.q at torque3.statgen.local    BIP   0/10      0.00     lx24-x86
>>
>> ##################################################################### 
>> #
>> ######
>>  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS -
>> PENDING JOBS
>> ##################################################################### 
>> #
>> ######
>>  109828 0.00000 LanceRecur sgeadmin     qw    07/31/2007
>> 10:31:21     1
>>  109829 0.00000 LanceRecur sgeadmin     qw    07/31/2007
>> 10:31:21     1
>>  109830 0.00000 LanceRecur sgeadmin     qw    07/31/2007
>> 10:31:21     1
>>
>> $ /usr/local/sge/sge-root/bin/lx24-x86/qstat -j 109828
>> ==============================================================
>> job_number:                 109828
>> exec_file:                  job_scripts/109828
>> submission_time:            Tue Jul 31 10:31:21 2007
>> owner:                      sgeadmin
>> uid:                        20100
>> group:                      sgeadmin
>> gid:                        20100
>> sge_o_home:                 /home/sgeadmin
>> sge_o_log_name:             sgeadmin
>> sge_o_path:                 /tmp/109827.1.all.q:/usr/local/bin:/
>> bin:/usr/bin
>> sge_o_shell:                /bin/bash
>> sge_o_workdir:              /home/sgeadmin/alextest/tmp
>> sge_o_host:                 TORQUE1
>> account:                    sge
>> cwd:                        /home/sgeadmin/alextest/tmp
>> path_aliases:               /tmp_mnt/ * * /
>> stderr_path_list:           /home/sgeadmin/alextest/tmp
>> mail_options:               abe
>> mail_list:                  alexandre.racine at mhicc.org
>> notify:                     FALSE
>> job_name:                   LanceRecursif.sh
>> stdout_path_list:           /home/sgeadmin/alextest/tmp
>> jobshare:                   0
>> shell_list:                 /bin/bash
>> env_list:
>> job_args:                   2,SGE
>> script_file:                /home/sgeadmin/alextest/LanceRecursif.sh
>> scheduling info:            queue instance
>> "all.q at TORQUE1.statgen.local" dropped because it is full
>>
>>
>> The weird part is that the queue is empty...
>
> You mean, the job will not start in any of the queue instances at all?
>
> -- Reuti
>
>
>>
>>
>>
>>
>> Alexandre Racine
>> Projets spéciaux
>>
>>
>>
>> -----Original Message-----
>> From: Daniel Templeton [mailto:Dan.Templeton at Sun.COM]
>> Sent: Mon 2007-07-30 15:56
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] slots and more - what are they?
>>
>> Alexandre,
>>
>> What does qstat -j tell you for the jobs which are in the qw state?
>> Odds are there aren't enough resources to schedule the remaining  
>> jobs,
>> possibly because of the job_load_adjustment settings.
>>
>> Daniel
>>
>> Alexandre Racine wrote:
>>>
>>> I have created this nice recursive program to tests the  
>>> scheduling of
>>> SGE and I can't really gasp the concept of the slots in the sense
>>> that
>>> I edit the default queue put more processors but I always have a
>>> maximum of 7 out of 10 used processors. Why?
>>>
>>> So I edit the queue...
>>>  #/usr/local/sge/sge-root/bin/lx24-x86/qconf -mq all.q
>>>
>>> Put 3 slots (but with one the result is the same) and 10
>>> processors to
>>> my servers TORQUE2 and torque3 (nevermind the names here)
>>> qname                 all.q
>>> hostlist              @allhosts
>>> seq_no                0
>>> load_thresholds       np_load_avg=1.75
>>> [...]
>>> slots
>>> 3,[TORQUE1.statgen.local=1],[TORQUE2.statgen.local=10], \
>>>                       [torque3.statgen.local=10]
>>>
>>>
>>> Then lauch my recursive program to test the queue and it never goes
>>> other 7 tasks on TORQUE2. Does the number of slots happen to affect
>>> something because the results are the same if there is only 1
>>> slot....
>>>
>>>
>>> queuename                      qtype used/tot. load_avg arch
>>> states
>>> -------------------------------------------------------------------- 
>>> -
>>> -------
>>> all.q at TORQUE1.statgen.local    BIP   1/1       0.00     lx24-x86
>>>  109777 0.55500 LanceRecur sgeadmin     r     07/30/2007
>>> 15:38:58     1
>>> -------------------------------------------------------------------- 
>>> -
>>> -------
>>> all.q at TORQUE2.statgen.local    BIP   7/10      0.00     lx24-x86
>>>  109778 0.55500 LanceRecur sgeadmin     r     07/30/2007
>>> 15:38:58     1
>>>  109780 0.55500 LanceRecur sgeadmin     t     07/30/2007
>>> 15:38:58     1
>>>  109781 0.55500 LanceRecur sgeadmin     t     07/30/2007
>>> 15:38:58     1
>>>  109783 0.55500 LanceRecur sgeadmin     t     07/30/2007
>>> 15:38:58     1
>>>  109784 0.55500 LanceRecur sgeadmin     t     07/30/2007
>>> 15:38:58     1
>>>  109786 0.55500 LanceRecur sgeadmin     t     07/30/2007
>>> 15:38:58     1
>>>  109787 0.55500 LanceRecur sgeadmin     t     07/30/2007
>>> 15:38:58     1
>>> -------------------------------------------------------------------- 
>>> -
>>> -------
>>> all.q at torque3.statgen.local    BIP   4/10      0.14     lx24-x86
>>>  109779 0.55500 LanceRecur sgeadmin     r     07/30/2007
>>> 15:38:58     1
>>>  109782 0.55500 LanceRecur sgeadmin     t     07/30/2007
>>> 15:38:58     1
>>>  109785 0.55500 LanceRecur sgeadmin     t     07/30/2007
>>> 15:38:58     1
>>>  109788 0.55500 LanceRecur sgeadmin     t     07/30/2007
>>> 15:38:58     1
>>>
>>> #################################################################### 
>>> #
>>> #######
>>>  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS -
>>> PENDING
>>> JOBS
>>> #################################################################### 
>>> #
>>> #######
>>>  109789 0.55500 LanceRecur sgeadmin     qw    07/30/2007
>>> 15:38:45     1
>>>  109790 0.55500 LanceRecur sgeadmin     qw    07/30/2007
>>> 15:38:45     1
>>>
>>>
>>>
>>> Thanks.
>>>
>>> Alexandre Racine
>>> Projets spéciaux
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list