[GE users] manage NFS resources

reuti reuti at staff.uni-marburg.de
Wed Sep 9 11:54:22 BST 2009


Hi,

Am 09.09.2009 um 10:17 schrieb murple:

> I'm not sure if it's possible but I'm looking for a good solution for
> following problem. One type of task running in our cluster starts a  
> lot
> of identical jobs. All of them need to read some big input files from
> the same NFS fileserver to start.
> So in the beginning they all wait for I/O. Better would be to delay  
> the
> start of additional jobs until bandwidth is available again.  One
> possibility is to script the starttime of the jobs accordingly. But  
> for
> this one needs to guess the time needed for a single job to read  
> the input.

how do you submit the jobs? Many qsubs or one array job?

One option might be, to submit the jobs with a hold, and then the  
first job releases the hold of another job after the input file was  
read.

Although I don't want to push you to move away from SGE: there is a  
simple job scheduler called GNUbatch. It has a plain FIFO of the  
jobs, but one nice and AFAIK unique feature: static variables. These  
can be set at any time to any value, and the startup of jobs can  
depend on certain conditions of the variables:

http://gridengine.sunsource.net/ds/viewMessage.do? 
dsMessageId=115491&dsForumId=39

There are even commands to change the value of the variables at any  
time from the commandline. So it's a completely different concept,  
than the resources in SGE or Torque, which will be allocated when the  
job starts, and given back only at the end of the jobs. Using such a  
variable, the job could depend on the availability of a variable like  
"NFS_ACCESS" not to be higher than 2, and during the job you could  
decrement the variable again.

-- Reuti


> I thing a consumable is not a solution since this would only be freed
> after job ends (long time after network bandwidth is freed).
>
> This week we had one occasion where almost 100% of CPU was in IOWAIT
> (according to Ganglia) for about 2.5 hours.
>
> regards, Andreas
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=216526
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=216544

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list