[GE users] checking mount points or any other user defined attributes
dag at sonsorol.org
Tue Nov 23 12:00:09 GMT 2010
Missing mount points representing OS and cluster problems are usually
checked by non-SGE cluster tools although you could presumably write a
JSV or Prolog script that could check for these things.
Best implementation I saw was at a site where the admins had a script
that probed for every OS issue they had ever encountered in the past.
The script ran at node boot time and periodically afterwards. As soon as
any problem was detected the node gets put into disabled state 'd' and
the admins get notified. The same script also puts the node into 'd'
state for the first 5 minutes after boot to make sure that there is time
for problems to show up and be detected before jobs start landing on it.
If the mounts are supposed to be missing (perhaps because different
servers have different mounts configured by deesign) then you can attach
a Boolean true/false attribute to the exec hosts and users could submit
jobs like: "qsub -l -hard fastScratch=true ./myJob.sh" or whatever.
For serious and transparent use a JSV might work. The JSV can examine
the user job script and make changes on the fly such as redirecting to a
different queue or queue instance.
License-aware scheduling is another matter. Google "Olesen FlexLM" to
see how it's done with SGE. Basically the modern method involves
declaring requestable/consumable resources for each license entitlement
and making it dynamic via a script that polls the license server and
constantly adjusts the value of the resource. This method has superseded
the load-sensor method.
> Is there an option by which SGE can check for the mount points, licenses
> etc before starting a job on a node?
> By doing this I want to restrict SGE not to submit jobs on the nodes
> which do not satisfy this.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users