[GE users] execd behaviour in case of qmaster crash

ah_sunsource ahaupt at ifh.de
Thu Jun 11 07:59:39 BST 2009


Hi Rayson,

thanks for your reply.

On Thu, 2009-06-11 at 01:33 -0500, rayson wrote:
> You can try to manually migrate the master to another host, but the
> shadow master should automatically handle everything for you.

Well, I think the shadow master did everything correctly. It noticed the
breakdown of the real qmaster, started the qmaster process and modified
$SGE_ROOT/$SGE_CELL/common/act_qmaster.

> http://gridengine.sunsource.net/howto/sge_migrate.html
> 
> Is your $SGE_ROOT shared??

Yes. All the execd now see the shadow master host name in
$SGE_ROOT/$SGE_CELL/common/act_qmaster. But they don't care about it
unfortunately...

I tested manual migration some time ago and this works perfectly. Looks
like the execd processes don't react correctly on the partly crashed
qmaster. But I'm not sure about this theory...

Thanks
Andreas

-- 
| Andreas Haupt             | E-Mail: andreas.haupt at desy.de
|  DESY Zeuthen             | WWW:    http://www-zeuthen.desy.de/~ahaupt
|  Platanenallee 6          | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen          | Fax:    +49/33762/7-7216

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201510

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list