[GE users] handling multiple sites

Leanne Shih LShih at hifn.com
Thu Jun 17 17:19:36 BST 2004


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]



Hi,

I'm trying to also setup 2 separate sites to share
the same master host and of course same spooled directory.

When I tried to do this, the remote site had a problem
mounting the shared spool directory from so far away.

Does anyone have any good suggestions on how to
get past this?  I had to unlink the spool directory
because the file server was not happy.

Also, I assume there is no way to safeguard the remote
site when the link goes down even with a shadow master.

Thanks,
Leanne

-----Original Message-----
From: Ron Chen [mailto:ron_chen_123 at yahoo.com]
Sent: Thursday, June 17, 2004 6:21 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Best practices for redundant farm?


If you have one big SGE qmaster that manages 2
different sites, and if the link between the 2 is
down, the shadow master won't work, since the job
spool files are not accessible from the other side.

Other problems are not difficult to solve, there are
setting like: "reschedule_unknown", "rerun", and
others to solve the problem that one site can't see
the other and still be able to handle the situration:
either rerun the jobs after X minutes or ignore the
problem.

 -Ron

--- John Ross <jhr at fenks.org> wrote:
> Hello.
> 
> I will soon be setting up a processing farm using
> Grid Engine 6
> 
> One of the things I'm trying to figure out how to
> setup the primary and
> backup sites.
> 
> The problem is a bit more then simply handling when
> the master goes down -
> we need to handle a situation when the master and a
> good portion of the
> farm disappears.
> 
> We'll have enough CPU at each site to finish the job
> should the other site
> go down, but we would still like to use all the
> resources whenever
> possible.
> 
> If I use a shadow master at the backup site, what
> would it do to any jobs
> that were running on the machines that it lost
> visibility to?
> 
> Would it be a better idea to build 2 plexes, with a
> global master (And
> shadow master)
> Again, how does the global master deal with any jobs
> that were running on
> the plex that just disappeared?
> 
> Any other ideas or thoughts?
> 
> -- 
> John Ross
> jhr at fenks.org
> 
> There's plenty of room for all God's creatures.
> Right next to the mashed potatoes.
> 	- Billboard ad for Saskatoon Restaurant
> 		Greenville, SC
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
> 



		
__________________________________
Do you Yahoo!?
Yahoo! Mail is new and improved - Check it out!
http://promotions.yahoo.com/new_mail

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list