[GE users] name server, and restarting sge master without losing jobs?

reuti reuti at staff.uni-marburg.de
Wed Aug 11 18:43:22 BST 2010


Hi,

Am 11.08.2010 um 19:22 schrieb gutnik:

> Our network admin is changing the name server, but every time he
> brings down the old name server, sge hangs.
> The machine on which sge is running has the correct resolv.conf, and
> can use the new name server with no problems.
> 
> So,
> 
> 1) Does SGE cache network information (including name server)? Is
> there a way to flush that?
> 

IIRC there is an internal buffer for 10 minutes for the hostnames. But did I get you right, that the only the machine which runs the name server changes, but not the name of any machines? To avoid such side effects, I usually put all machines of the cluster in /etc/hosts. So even when the name server is gone, the cluster will operate like usual on the internal side.

I don't know whether it's the case for resolv.conf, but e.g. the nsswitch.conf is only read once per process which uses it.


> 2) Last time I restarted the sge master server, I believe all queued
> jobs were killed. Is there some way

When you just shut down the qmaster and start it again, nothing should happen to any job. Neither to the running ones, nor to the waiting ones. They will just continued and waiting ones will be scheduled once the qmaster is up again.

If such a thing happens that you miss some jobs, next step to investigate is the message file of the qmaster. Maybe some jobs just ended while the qmaster was offline.

-- Reuti


>  to restart the daemon without losing all queued jobs? If I reboot
> the daemon server, will the list of queued
> jobs survive?
> 
> Thank you.
> 
>  Vadim
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=273765
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=273772

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list