[GE users] I need help with GE configuration

Reuti reuti at staff.uni-marburg.de
Tue Sep 2 14:56:52 BST 2008


Hi,


Am 02.09.2008 um 13:43 schrieb Esteban Freire:

> Yes, basically I have defined num_proc for a convenient output, and  
> in order to that each job uses the num_proc which it asked. So, If  
> I haven't understood bad, I cannot configure slots + num_proc on  
> the nodes/queues at the same time, if I'm using RQS, Have I  
> understood correctly?
>
> Ok, surely very soon I'll configure slots instead of num_proc on  
> queues and nodes and I won't configure num_proc for being asked in  
> qsub command.
>
>>> On the other hand, I have tested the  RQS rules additional that  
>>> you commented me, the problem is that, this is sequential, I  
>>> mean,  it starts  looking  what nodes have num_proc free, but if  
>>> the first node to check in the list is busy, it doesn't keep  
>>> looking, and therefore, this is not useful for me because at the  
>>> end I have free CPUS which I cannot used because it doesn't look  
>>> these nodes.
>>
>> No, this shouldn't be. Are you using "queue_sort_method    seqno"  
>> while observing this - can you set it to "np_load_avg"?
>>
> Ok, I changed *queue_sort_method* to load, but I keep getting the  
> same problem, I have slots free but  it seems that it checks the  
> first node, and  it see full,  and it doesn't keep checking more  
> nodes. I send you attach on this e-mail the output for qhost/qquota/ 
> qstat -j job/qconf --sconf commands. Surely, I'm obviating  
> something in my configuration, but I would appreciate if you can  
> help me.
>
> Thanks a lot,
> Esteban
>> --- Reuti
>>
>
> [root at ce2 ~]# qhost
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE   
> SWAPTO  SWAPUS
> ---------------------------------------------------------------------- 
> ---------
> global                  -               -     -       -        
> -       -       -
> wn001                   lx26-x86        0  3.06    3.7G  636.3M   
> 512.0M    4.5M
> wn002                   lx26-x86        1  3.11    3.7G    2.2G   
> 512.0M  112.0K
> wn003                   lx26-x86        0  3.11    3.6G    2.1G  
> 1024.0M  112.0K
> wn004                   lx26-x86        2  4.14    3.6G    1.0G   
> 512.0M    4.4M
> wn005                   lx26-x86        1  3.00    3.6G    2.0G  
> 1024.0M  112.0K
> wn006                   lx26-x86        1  3.07    3.6G  938.3M   
> 512.0M    4.1M
> wn007                   lx26-x86        2  4.00    3.6G  645.2M   
> 512.0M    4.0M
> wn008                   lx26-x86        0  3.06    3.6G  821.0M   
> 512.0M    4.0M
> wn009                   lx26-x86        2  3.08    3.7G  869.7M   
> 512.0M    8.0K
> wn010                   lx26-x86        2  4.01    3.7G  683.1M   
> 512.0M     0.0
> wn011                   lx26-x86        2  3.06    3.7G    1.2G   
> 512.0M    4.5M
> wn012                   lx26-x86        2  4.03    3.7G    2.5G   
> 512.0M    4.2M
> wn013                   lx26-x86        1  3.08    3.7G    1.3G   
> 512.0M    4.6M
> wn014                   -               0     -       -        
> -       -       -
>
> [root at ce2 ~]# qquota -u '*'
> resource quota rule limit                filter
> ---------------------------------------------------------------------- 
> ----------
> maxujobs/8         num_proc=37/100      users @cesga
> maxujobs/13        num_proc=30/30       users @compchem
> maxujobs/19        num_proc=21/30       users @biomed
> maxprocs/normal    num_proc=7/8         queues  
> alice,atlas,biomed,cesga,cms,compchem,diligent,dteam,fusion,imath,lhcb 
> ,swetest hosts wn013.egee.cesga.es

there is also the ! operator, to leave just one queue out. Maybe the  
notation would be easier with so many queues:

http://gridengine.sunsource.net/nonav/source/browse/~checkout~/ 
gridengine/doc/devel/rfe/ResourceQuotaSpecification.html

Did you define two RQS or put all rules in one RQS?

> maxprocs/normal    num_proc=8/8         queues  
> alice,atlas,biomed,cesga,cms,compchem,diligent,dteam,fusion,imath,lhcb 
> ,swetest hosts wn001.egee.cesga.es
> <snip>
> maxprocs_special/all num_proc=7/9         queues  
> alice,atlas,biomed,cesga,cms,compchem,diligent,dteam,fusion,imath,lhcb 
> ,ops,swetest hosts wn002.egee.cesga.e
> [root at ce2 ~]# qstat -j 25304
> ==============================================================
> job_number:                 25304
> exec_file:                  job_scripts/25304
> submission_time:            Tue Sep  2 12:27:23 2008
> owner:                      cesga004
> uid:                        30124
> group:                      cesga
> gid:                        30004
> sge_o_home:                 /home/glite/cesga004
> sge_o_log_name:             cesga004
> sge_o_path:                 /usr/local/sge/pro/bin/lx26-x86:/usr/ 
> kerberos/bin:/opt/edg/bin:/opt/glite/bin:/opt/lcg/bin:/usr/java/ 
> jdk1.5.0_14/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/ 
> globus/bin:/home/glite/cesga004/bin
> sge_o_shell:                /bin/bash
> sge_o_workdir:              /home/glite/cesga004
> sge_o_host:                 ce2
> account:                    sge
> hard resource_list:         num_proc=1,s_vmem=1G

You made s_vmem consumable?

The vague thing is, that I found something similar only in that case,  
having queue_sort_method set to seqno http://gridengine.sunsource.net/ 
issues/show_bug.cgi?id=2538

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list