[GE users] Missing slots on one node

j_polasek j-polasek at tamu.edu
Tue Apr 13 00:47:02 BST 2010


Hi Reuti,

Thanks alot.  That is exactly where the issue was!

Jeff
On Apr 12, 2010, at 5:35 PM, reuti wrote:

> Hi,
>
> Am 13.04.2010 um 00:32 schrieb j_polasek:
>
>> Howdy all,
>>
>> I am sure I am missing something simple, so if someone can point me
>> in the right direction i would be very thankful.
>>
>> The issue I am having is node200 will only allocate two of the eight
>> slots.  For example,  when an 8 slot job is submitted to the
>> scheduler, if it starts on node200, it will start 2 processes on
>> node200 and the remaining 6 processes on one of the remaining
>> nodes.  If the exact same job starts on any other node in the queue,
>> it allocates all 8 processes to a single node.
>>
>> I am running a heterogeneous cluster with SGE 6.1u4  and one of my
>> queues has 16 nodes with 8 cores each (128 slots).
>
> any load_adjustments in the exechost definition (qconf -se node200)?
>
> -- Reuti
>
>
>> The parallel environments fluent_pe ib-openmpi ib-mvapich) are set
>> to 128 slots. and use the $fill-up allocation rule.
>>
>> The qconf -sq I1.q shows
>>
>> qname                 I1.q
>> hostlist              node200.cluster.private
>> node201.cluster.private \
>>                     node202.cluster.private node203.cluster.private \
>>                     node204.cluster.private node205.cluster.private \
>>                     node206.cluster.private node207.cluster.private \
>>                     node208.cluster.private node209.cluster.private \
>>                     node210.cluster.private node211.cluster.private \
>>                     node212.cluster.private node213.cluster.private \
>>                     node214.cluster.private node215.cluster.private
>> seq_no                0
>> load_thresholds       np_load_avg=1.25
>> suspend_thresholds    NONE
>> nsuspend              1
>> suspend_interval      00:05:00
>> priority              0
>> min_cpu_interval      00:05:00
>> processors            UNDEFINED
>> qtype                 BATCH INTERACTIVE
>> ckpt_list             NONE
>> pe_list               make fluent_pe ib-openmpi ib-mvapich
>> rerun                 FALSE
>> slots                 1,[node200.cluster.private=8], \
>>                     [node201.cluster.private=8],
>> [node202.cluster.private=8], \
>>                     [node203.cluster.private=8],
>> [node204.cluster.private=8], \
>>                     [node205.cluster.private=8],
>> [node206.cluster.private=8], \
>>                     [node209.cluster.private=8],
>> [node210.cluster.private=8], \
>>                     [node211.cluster.private=8],
>> [node212.cluster.private=8], \
>>                     [node213.cluster.private=8],
>> [node214.cluster.private=8], \
>>                     [node215.cluster.private=8],
>> [node207.cluster.private=8], \
>>                     [node208.cluster.private=8]
>> tmpdir                /tmp
>> shell                 /bin/csh
>> prolog                NONE
>> epilog                NONE
>> shell_start_mode      posix_compliant
>> starter_method        NONE
>> suspend_method        NONE
>> resume_method         NONE
>> terminate_method      NONE
>> notify                00:00:60
>> owner_list            NONE
>> user_lists            a0d-me
>> xuser_lists           NONE
>> subordinate_list      NONE
>> complex_values        NONE
>> projects              NONE
>> xprojects             NONE
>> calendar              NONE
>> initial_state         default
>> s_rt                  INFINITY
>> h_rt                  INFINITY
>> s_cpu                 INFINITY
>> h_cpu                 INFINITY
>> s_fsize               INFINITY
>> h_fsize               INFINITY
>> s_data                INFINITY
>> h_data                INFINITY
>> s_stack               INFINITY
>> h_stack               INFINITY
>> s_core                INFINITY
>> h_core                INFINITY
>> s_rss                 INFINITY
>> h_rss                 INFINITY
>> s_vmem                INFINITY
>> h_vmem                INFINITY
>>
>>
>>
>> qstat -g c shows
>>
>> I1.q                              0.60     72     56    128
>> 0      0
>>
>>
>> Node200 is the only node acting this way.  I have not been able to
>> find any setting that would cause this. Any ideas?
>>
>> Thanks
>>
>> Jeff
>>
>>
>>
>> Jeff Polasek
>> Computer Systems Manager
>> Artie McFerrin Chemical Engineering Department
>> Texas A&M University
>> 979-845=3398
>> j-polasek at tamu.edu
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253184
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
>> ].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253185
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253193

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list