[GE users] checking mount points or any other user defined attributes
bharanitn at yahoo.com
Wed Nov 24 05:31:57 GMT 2010
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
--- On Tue, 23/11/10, craffi <dag at sonsorol.org> wrote:
From: craffi <dag at sonsorol.org>
Subject: Re: [GE users] checking mount points or any other user defined attributes
To: users at gridengine.sunsource.net
Date: Tuesday, 23 November, 2010, 5:30 PM
Missing mount points representing OS and cluster problems are usually
checked by non-SGE cluster tools although you could presumably write a
JSV or Prolog script that could check for these things.
Best implementation I saw was at a site where the admins had a script
that probed for every OS issue they had ever encountered in the past.
The script ran at node boot time and periodically afterwards. As soon as
any problem was detected the node gets put into disabled state 'd' and
the admins get notified. The same script also puts the node into 'd'
state for the first 5 minutes after boot to make sure that there is time
for problems to show up and be detected before jobs start landing on it.
If the mounts are supposed to be missing (perhaps because different
servers have different mounts configured by deesign) then you can attach
a Boolean true/false attribute to the exec hosts and users could submit
jobs like: "qsub -l -hard fastScratch=true ./myJob.sh" or whatever.
For serious and transparent use a JSV might work. The JSV can examine
the user job script and make changes on the fly such as redirecting to a
different queue or queue instance.
License-aware scheduling is another matter. Google "Olesen FlexLM" to
see how it's done with SGE. Basically the modern method involves
declaring requestable/consumable resources for each license entitlement
and making it dynamic via a script that polls the license server and
constantly adjusts the value of the resource. This method has superseded
the load-sensor method.
That's a lot of information. But i'm really not sure if i'll be able to set it up like this. Because we are currently using DRMAA for submitting array jobs. The DRMAA is in python, but it does not use any -l flag at the moment.
> Is there an option by which SGE can check for the mount points, licenses
> etc before starting a job on a node?
> By doing this I want to restrict SGE not to submit jobs on the nodes
> which do not satisfy this.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net</mc/compose?to=users-unsubscribe at gridengine.sunsource.net>].
More information about the gridengine-users