[GE users] Default queue state for rebooted machines?

Robert Griffiths Robert.Griffiths at mitsubishi-sec-intl.com
Wed May 4 10:43:28 BST 2005

Hi Fred,

Thanks for those suggestions. I rather like the 'qmod -d' solution in the
startup - that means I can farm out the job to someone else ;-)

The idea about waiting for a flag file could be a winner as well, except
that the shared memory data is usually populated by a human-operated shell
script and we'd rather be short of a machine rather than suffer potentially
corrupt (old) data, so I don't think that's a "go-er". Shame.

As for loadsensors, erm... That's not something I'm familiar with at all,
so, unless it's ludicrously easy, I'm opting for leaving that one alone.

Thanks for a quick reply!


PS Apologies for the length of the company-added .sig - nothing I can do...

-----Original Message-----
From: Fred L Youhanaie [mailto:fly at anydata.co.uk] 
Sent: 04 May 2005 10:36
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Default queue state for rebooted machines?

Hi Rob,

The enabled/disabled state of a queue/host is stored within the qmaster 
rather than the execd on the node.

One simple solution I can think of is to stick 'qmod -d ...' inside the 
sge startup script on each node, let the execd to startup and then 
manually enable it afterwards.

Alternatively, you can always make the startup script loop/wait for the 
presence of a flag file which is created after the shared data is 
loaded. Or, How about a loadsensor that indicates the presence of the 
shared data on each node? ...



Robert Griffiths wrote:
> Morning all,
> I was just wondering if there is a way to configure SGE (version 5.3p6) in
> such a way that when a known machine/execution host reboots itself (for
> whatever reason) and becomes operational again, it should not
> be placed in the set of active execution hosts? Something as simple as
> starting all queueus in disabled mode would be ideal.
> Our jobs rely on data being uploaded into shared memory before they can
> and, because the machine rebooted and was devoid of our data in its shared
> memory, SGE sent huge amounts of jobs to that machine because it was
> processing them so quickly. We would have been better off if the machine
> remained dead!
> Any ideas if this is possible? If it is possible but only in N1Grid, then
> please let me know as we will be migrating to 6.0.4 (or whatever the then
> current version) later on this year.
> Cheers,
> Rob

Mitsubishi Securities International plc ('MSI') is 
registered in England, company number 1698498 and 
registered office at 6 Broadgate, London EC2M 2AA. 
MSI is part of the Mitsubishi Tokyo Financial Group 
and is authorised and regulated by The Financial 
Services Authority. This message is intended solely 
for the individual addressee named above. The 
information contained in this e-mail is confidential 
and may be legally privileged. If you are not the 
intended recipient please delete in its entirety. 
Messages sent via this medium may be subject to 
delays, non-delivery and unauthorised alteration. 
The information contained herein or attached hereto 
has been obtained from sources we believe to be 
reliable but we do not represent that it is accurate 
or complete. Any reference to past performance should 
not be taken as an indication of future performance. 
The information contained herein or attached hereto 
is not to be construed as an offer or solicitation to 
buy or sell any security, instrument or investment. 
MSI or any affiliated company, may have an interest, 
position, or effect transactions, in any investment 
mentioned herein. Any opinions or recommendations 
expressed herein are solely those of the author or 
analyst and are subject to change without notice.

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list