[GE users] GE6.2 num_proc consumable strange behavior for mpi $fillup jobs

reuti reuti at staff.uni-marburg.de
Mon Feb 2 18:21:35 GMT 2009


Am 02.02.2009 um 18:22 schrieb jlopez:

>>> In the example above I have a node with 4 processors so if a job
>>> requesting "num_proc=4 and "-pe mpi 4" will consume 4x4=16  
>>> processors.
>>> This would be an example of an hybrid mpi+openmp job.
>>>
>>
>> Aha, so the num_proc was multplied by the requested slot count as
>> it's usual done for consumable reqource requests. Why not requesting:
>>
>> -pe mpi 16
>>
>> and using a fixed "allocation_rule 4". In you jobscript you will get
>> the number of slots ($NSLOST) and hosts ($NHOSTS). Then you can set:
>>
>> export omp_num_proc=$(($NSLOST/$NHOSTS))

As I just saw: you don't need the $ inside the brackets:

export omp_num_proc=$((NSLOTS/NHOSTS))

which will shorten the things.


>>
>> How did you set the omp_num_proc before?
>>
> I set OMP_NUM_THREADS using the num_proc value requested but of course
> it would be possible to use the option you suggest.
>
> I like the alternative implementation you suggest and I will see if we
> can implement it.
>
> In any case I still don't understand why GE 6.2 is not correctly doing
> the multiplication of the consumable. I even tried with another
> consumable num_proc2 and I get the same odd behavior as with num_proc
> (in this case there is no overlap with an existing load_value). In  
> this
> case I would expect that the accounting of the used consumable  
> would be
> slots*num_proc2.

Yeah, if you define a completely new complex it should work. It's  
something like a license or so. I never dared to touch num_proc,  
therefore I suggested to stay with slots and it will work, but I also  
see a strange behavior, giving negative values for a consumable.

$ qhost -F num_proc2
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE   
SWAPTO  SWAPUS
------------------------------------------------------------------------ 
-------
global                  -               -     -       -       -        
-       -
pc15370                 lx24-x86        1  0.03  979.9M   42.2M   
517.7M     0.0
    hc:num_proc2=-4.000000

Okay, it's an issue - can you please file one (note: it's still in  
6.2u1 on my cluster). Thx for getting to the root of it.

This might also explains the negative slot count some observed, and  
it seems only by coincidence that it never happend on my cluster as  
have no $fill_up. Maybe resources are subtracted after the slot  
allocation is done without checking their availability.

-- Reuti


> After all these tests to me it looks like a bug when using $fill_up.
>
> Cheers,
> Javier
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=101411
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].<jlopez.vcf>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=101428

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list