[GE users] manage NFS resources
stadtherre at bit-sys.com
Wed Sep 9 15:24:18 BST 2009
[ The following text is in the "utf-8" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some characters may be displayed incorrectly. ]
Have you thought about throttling the NFS access at the file server?
On Linux, you can set the RPCNFSDCOUNT variable in the /etc/sysconfig/nfs file to set the number of kernel threads that are serving NFS requests. If this number is too high, the server will thrash in I/O wait (like you've seen). However, if the number is too low the clients will have to wait unnecessarily long for NFS responses. A good starting point is twice the number of CPU's/cores on the server. You can fine tune it based on usage patterns.
On Solaris you can configure the number of nfsd processes to achieve the same result, but I don't know off the top of my head where you configure this count.
On Wed, 2009-09-09 at 15:12 +0100, markhewitt wrote:
> You mean $TASK_ID?
> That would work if you know the time the input loading is finished. That
> time can be calculated if you assume that no other job is using the
> fileserver at the same time and if you know what files will be loaded.
> Instead of starting the subjobs and having them sleep I also could give
> them a varying start time. So they could leave room for other jobs which
> hopefully use other fileservers.
> But I still have no tool to make sure the bandwidth to the fileserver is
> not exhausted. Could a load_sensor be of help? Would then every job
> request some needed bandwidth (and the time this bandwidth is needed) I
> have a hard time imaging any solution to this.
Sure it's a crude way of doing it and the load on the file server will
likely spike but c'est la vie.
Another technique you could use a lock file. Quite simply when the first
job starts reading in the data it creates a file, an empty file is
sufficient, then when it's finished reading the data it deletes the file.
All jobs can then have a while loop which said while the lock file
exists, sleep for (say) 60 seconds, then try again.
That would at least ensure there is only one node accessing the file
server at any one time.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].
BIT Systems, Inc.
More information about the gridengine-users