[GE users] Re: [was: GE issues] mem_free/ vf attributes not being honoured by the queue

reuti reuti at staff.uni-marburg.de
Thu Dec 10 14:29:05 GMT 2009


[Moved to the users list as it's more appropriate.]


Hi,

Am 09.12.2009 um 20:42 schrieb gascan211:

> We recently started using SGE on one of our clusters. Its been  
> working well for us, we encountered this issue couple of days back.  
> We tried googling for answer without any success..
>
> Our cluster has nodes each with 8 cores / 24 GB RAM and have two  
> queues all.q and primary
>
> When we submit the following job to all.q, we see only one job per  
> node which is the expected behavior.

as there is only one slot.


> Where as when we submit this job to primary queue, it is allowing 8  
> jobs per node.
>
> [root at cluster-11 ~]# cat test.sh
> #!/bin/bash
> #$ -q all.q                                                    ----- 
> > replaced with primary for testing with other queue.
> #$ -l mem_free=23G

Unless it's defined of being consumable, it can be fulfilled for many  
jobs when the memory isn't used by the other running jobs - it's just  
the actual load of the memory.


> #$ -pe smp 1
> sleep 100
>
> We came across this page and we tried adding virtual_free complex  
> value, but it didnt help either.
> http://wikis.sun.com/display/gridengine62u2/Example%202%20-%20Space% 
> 20Sharing%20for%20Virtual%20Memory

Fine, this is the way to do it.

Which steps did you do in detail, maybe you missed one of this setup?

-- Reuti


>
> Any help would be great.
>
> Thanks,
> Kumar
>
> [root at cluster-11 ~]# qconf -sq all.q
> qname                 all.q
> hostlist              n0 n1 n2 n3 n4 n5 n6 n7 n8 n9
> seq_no                0
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               make smp
> rerun                 FALSE
> slots                 1
> tmpdir                /tmp
> shell                 /bin/csh
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
>
> [root at cluster-11 ~]# qconf -sq primary
> qname                 bioscope
> hostlist              n0 n1 n2 n3 n4 n5 n6 n7 n8 n9
> seq_no                0
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               make smp
> rerun                 FALSE
> slots                 8
> tmpdir                /tmp
> shell                 /bin/csh
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            bioscope
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        virtual_free=23G
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=232628

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list