[GE users] manage NFS resources
andreas.kuntzagk at mdc-berlin.de
Wed Sep 9 12:09:56 BST 2009
>> I'm not sure if it's possible but I'm looking for a good solution for
>> following problem. One type of task running in our cluster starts a lot
>> of identical jobs. All of them need to read some big input files from
>> the same NFS fileserver to start.
>> So in the beginning they all wait for I/O. Better would be to delay the
>> start of additional jobs until bandwidth is available again. One
>> possibility is to script the starttime of the jobs accordingly. But for
>> this one needs to guess the time needed for a single job to read the input.
>> I thing a consumable is not a solution since this would only be freed
>> after job ends (long time after network bandwidth is freed).
>> This week we had one occasion where almost 100% of CPU was in IOWAIT
>> (according to Ganglia) for about 2.5 hours.
>> regards, Andreas
> Don't know the exact syntax but how about an array job where the
> parameter it takes is used as a multiplier for 'sleep'. All jobs would
> enter SGE at the same time but most would not do anything for a period
> of time before waking up.
You mean $TASK_ID?
That would work if you know the time the input loading is finished. That
time can be calculated if you assume that no other job is using the
fileserver at the same time and if you know what files will be loaded.
Instead of starting the subjobs and having them sleep I also could give
them a varying start time. So they could leave room for other jobs which
hopefully use other fileservers.
But I still have no tool to make sure the bandwidth to the fileserver is
not exhausted. Could a load_sensor be of help? Would then every job
request some needed bandwidth (and the time this bandwidth is needed) I
have a hard time imaging any solution to this.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users