[GE issues] [Issue 2901] New - Negative values for consumables when using $fill_up

jlopez jlopez at cesga.es
Tue Feb 3 16:25:55 GMT 2009


http://gridengine.sunsource.net/issues/show_bug.cgi?id=2901
                 Issue #|2901
                 Summary|Negative values for consumables when using $fill_up
               Component|gridengine
                 Version|6.2u1
                Platform|All
                     URL|http://gridengine.sunsource.net/ds/viewMessage.do?dsFo
                        |rumId=38&dsMessageId=100070
              OS/Version|All
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P3
            Subcomponent|scheduling
             Assigned to|andreas
             Reported by|jlopez






------- Additional comments from jlopez at sunsource.net Tue Feb  3 08:25:51 -0800 2009 -------
When using $fill_up as the allocation rule in an MPI PE the scheduler is not
correctly calculating the consumable resources and a given job can consume more
than the available value of the resource. After that the consumable appears with
negative value.

For example:
$ qhost -F num_proc2
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE   
SWAPTO  SWAPUS
------------------------------------------------------------------------ 
-------
global                  -               -     -       -       -        
-       -
pc15370                 lx24-x86        1  0.03  979.9M   42.2M   
517.7M     0.0
    hc:num_proc2=-4.000000

This behavior does not happen when using the $round_robin allocation policy.

The problem has been discussed here and Reuti has suggested opening an issue:
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=100070


Here is an example of how using a different allocation rule in the PE you get
different results (mpi_fu uses $fill_up and mpi_rr uses $round_robin):
jlopez at cn005:~> qsub.orig -w v -l 
num_proc2=16,s_rt=00:10:00,s_vmem=1G,h_fsize=20G -pe mpi_fu 16 -q 
sistemas at cn110.null env.sh
verification: found suitable queue(s)

jlopez at cn005:~> qsub.orig -w v -l 
num_proc2=16,s_rt=00:10:00,s_vmem=1G,h_fsize=20G -pe mpi_rr 16 -q 
sistemas at cn110.null env.sh
...
verification: no suitable queues.
Exiting.

num_proc2 is a consumable set to the total number of processors of the node, in
this case it is fixed to 16 for the node cn110. As you can see in the first case
the job will run and it will leave the node with a value of num_proc2=-240 as we
have verified.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=101714

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list