[GE users] problem with migrating to shadow master

reuti reuti at staff.uni-marburg.de
Thu Nov 25 19:14:47 GMT 2010


Am 24.11.2010 um 11:12 schrieb rumpelkeks:

>> <snip>
>> it looks like the lock file is written to confirm a successful shutdown then (the opposite of what I was used to), and will prevent that a shadowd will take action of an unchanged heartbeat file then, as -migrate will first shut down the actual master and then start its own.
>> 
>> Do you want to have two qmasters, which can startup when the other is missing two-way, so you have a shadowd running on both of them?
> 
> That seems to be the theory, only in my case it doesn't seem to work 
> very well (and I'm trying to find out why - it might be a timing issue, 
> my $SGE_ROOT/$SGE_CELL/spool etc is on an NFS share).

Do you use classic spooling? The common directory is also on the share?

Both machines can also write to these shares?


> Yes I do want (and have) two servers running a shadowd (and one running 
> a qmaster) so they can take over if one fails. Which from my tests is 
> stil working (I'll do some more testing).
> 
> However, not actually being able to cleanly migrate (well, not unless I 
> manually fake a lock file to appear) is annoying. It's a useful feature, 
> I think; I stumbled upon this problem when I wanted to migrate of the 
> current master to be able to take it down for maintenance. I am sure it 
> used to work, I tested it a lot when I set up the shadow master.
> 
> This was before I upgraded to 6.2u4 from 6.2u2 though - did the 
> mechanism on how a migration is handled change between u2 and u4 to 
> anyone's knowledge? I'm trying to find out if this is a problem within 
> SGE (odd timing or something), or a problem with my setup (which I don't 
> think changed since this was working). I can't fine a lot of information 
> about about the actual mechanism (i.e. who is supposed to write the lock 
> file, and when; stuff like that), which limits my debugging capabilities 

If it's a timing issue, you should at least see the lock file on the machine where it was created, as it should have it already in his cache. Only the NFS share might get the final write later.

Maybe running SGE in debug mode will show more, as the creation should show up there when it's happening if I get the source right.

-- Reuti


> a bit :)
> 
> Tina
> 
>> -- Reuti
>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298089
>> 
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>> 
> 
> 
> -- 
> Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
> Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
> 
> -- 
> This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298278
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298813

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list