[GE users] interesting bug

adary adary at marvell.com
Thu Feb 12 15:59:26 GMT 2009


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Not only those slots are real, I had jobs running in those slots before I disabled them.

Must be a bug.

Fortunately I started the project to upgrade both clusters to 6.2u1
________________________________

Yuval Adar, Marvell Israel - Senior UNIX System Administrator
Park Azorim, Kyriat Arie
Petah Tikva, 49527, Israel
Email: adary at marvell.com
Office: +972.3.9703958 - OnNet: 705.3958
Fax: +972.3.9704999
Mobile: +972.54.2493958
Web site: http://www.marvell.com

This message may contain confidential, proprietary or legally privileged information. The information is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by telephone or by e-mail and delete the message from your computer.
________________________________



-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de]
Sent: Thursday, February 12, 2009 5:27 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] interesting bug

Am 12.02.2009 um 14:44 schrieb adary:

> This is what I get in qhost -h lnx200 -q :
>
> [151]~> qhost -h lnx200 -q
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE
> SWAPTO  SWAPUS
> ----------------------------------------------------------------------
> ---------
> global                  -               -     -       -
> -       -       -
> lnx200                  lx24-amd64     16  3.96  125.8G   53.8G
> 192.0G     0.0
>    admin                BIP   0/1
>    direct               BIP   0/10
>    all.q                BP    0/16
>    bulk                 BP    0/12
>    heavy                BIP   8/16
>    ns_bulk              BIP   0/4      d

I see, but is there really a queue-instance or is there only an
output error. What is the output of:

$ qstat -g c

or

$ qstat -f

Does the total number of slots include the superfluous machine?

Anyway, somehow I remember reading about it but I can't find it, and
it seems to be fixed in 6.2u1.

-- Reuti

>
> this host lnx200 belongs to a hostgroup called @RHEL4-128-16-4 and
> @layout hosts
>
> for queues:
>
> heavy :
>
> [152]~> qconf -sq heavy
> qname                 heavy
> hostlist              @RHEL4-128-16-4 @RHEL4-128-8-3 @RHEL4-128-8-4 \
>                       @RHEL4-16-4-3 @RHEL4-24-4-3 @RHEL4-32-4-3
> @RHEL4-32-4-4 \
>                       @RHEL4-4-4-3 @RHEL4-4-4-4 @RHEL4-64-8-3
> @RHEL4-8-4-3
>
>
> ns_bulk :
>
> [153]~> qconf -sq ns_bulk
> qname                 ns_bulk
> hostlist              @RHEL4-16-2-2 @RHEL4-16-4-2 @RHEL4-32-4-2
> @RHEL4-4-1-2 \
>                       @RHEL4-4-2-2 @RHEL4-4-4-2 @RHEL4-6-2-2
> @RHEL4-64-4-2 \
>                       @RHEL4-64-8-2 @RHEL4-8-2-1 @RHEL4-8-2-2
> @RHEL4-8-4-2
>
> As you can see, the hostgroup @RHEL4-128-16-4 is only in the
> hostlist of the heavy queue. Both heavy and ns_bulk are high
> priority queues, and we don?t have a single host in the cluster
> that belongs to both of these queues
>
> After I did the project reservation in the heavy queue :
>
> projects              NONE,[@layout_hosts=layout]
>
> I did the same for ns_bulk since I wanted to prepare for future,
> since I might have a host that belongs to ns_bulk queue that will
> also need the same type of reservation.
>
> After I did this, the queue instance appeared on the hosts that
> were supposed to be only in the heavy queue, and not in the ns_bulk
> queue (that?s the reason I disabled that queue instance in the
> example above)
>
> Even after I removed the project reservation in the ns_bulk queue,
> the queue instances remained on the two hosts in @layout_hosts
> group and I cant remove them.
>
> ________________________________
>
> Yuval Adar, Marvell Israel - Senior UNIX System Administrator
> Park Azorim, Kyriat Arie
> Petah Tikva, 49527, Israel
> Email: adary at marvell.com
> Office: +972.3.9703958 - OnNet: 705.3958
> Fax: +972.3.9704999
> Mobile: +972.54.2493958
> Web site: http://www.marvell.com
>
> This message may contain confidential, proprietary or legally
> privileged information. The information is intended only for the
> use of the individual or entity named above. If the reader of this
> message is not the intended recipient, you are hereby notified that
> any dissemination, distribution or copying of this communication is
> strictly prohibited. If you have received this communication in
> error, please notify us immediately by telephone or by e-mail and
> delete the message from your computer.
> ________________________________
>
>
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Thursday, February 12, 2009 3:22 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] interesting bug
>
> Hi,
>
> Am 12.02.2009 um 13:40 schrieb adary:
>
>> I just found something in my configuration that I can only explain
>> as bug in 6.1u3:
>>
>> I used the projects property in a queue to limit certain queue
>> instances to certain projects.
>>
>> In a queue that doesn?t have instances on certain hosts, I added
>> the same rule:
>>
>> projects    NONE,[@layout_hosts = layout ]
>>
>> in this case it created the queue instace for that queue on the two
>> hosts that are in @layout_hosts and now I cant find a way to remove
>> that queue instance. It gave it the default number of slots defined
>> for that queue ? 4
>
> you mean you get an additonal line for this queue instance in `qstat -
> f`?
>
> -- Reuti
>
>
>>
>> anyone else experienced this?
>>
>>
>> Yuval Adar, Marvell Israel - Senior UNIX System Administrator
>> Park Azorim, Kyriat Arie
>> Petah Tikva, 49527, Israel
>> Email: adary at marvell.com
>> Office: +972.3.9703958 - OnNet: 705.3958
>> Fax: +972.3.9704999
>> Mobile: +972.54.2493958
>> Web site: http://www.marvell.com
>>
>> This message may contain confidential, proprietary or legally
>> privileged information. The information is intended only for the
>> use of the individual or entity named above. If the reader of this
>> message is not the intended recipient, you are hereby notified that
>> any dissemination, distribution or copying of this communication is
>> strictly prohibited. If you have received this communication in
>> error, please notify us immediately by telephone or by e-mail and
>> delete the message from your computer.
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?
> dsForumId=38&dsMessageId=104019
>
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?
> dsForumId=38&dsMessageId=104035
>
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=104098

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=104115

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list