[GE users] filesystem question
andreas at kostyrka.org
Mon Feb 16 18:02:38 GMT 2009
Am Mon, 16 Feb 2009 12:25:12 +0100
schrieb reuti <reuti at staff.uni-marburg.de>:
> Am 16.02.2009 um 10:23 schrieb yacc143:
> > I wonder what people are using for a shared filesystem?
> > We've got a nice SAN (with >400MB/s IO rates) on the head node,
> > but the
> > compute nodes are connected via NFS over 1Gb/s network cards, and we
> > are getting around 70-100MB/s (at best) shared transfer rate for all
> > compute nodes. I think that is currently are biggest bottleneck, so
> > I wonder what other people are using for their network filesystem?
Ah, the headnode as such is linked via 2x1Gb/s cards to the switch.
> this depends what you have to compute. We copy the files at the
> beginning of the job to a node, run most of the computations
> locally, and then copy the results back.
My problem is that "usually", our jobs consists of:
1.) a Win32 box generates the problem. Either to a "local drive" or to
the Samba share on the headnode of the cluster.
2.) The input directories get copied (or hardlinked) to provide a flat
input directory for Matlab, GAMS or some locally developed calculations.
3.) The results are copied back (via Samba and/or hardlinks again).
(The hardlinking is because the input directory is generated split out.)
The problem is, that for most purposes:
-) the input/output can be huge, GBs worth of data.
-) the solver reads it mostly once into memory. Sometimes twice, but
for that use case the data should be in the page cache of the NFS
So precopying the stuff to the local disc of the compute node as such
is still a problem. (I wonder if a httpd on the headnode might be not
faster than NFS for streaming the stuff to the compute nodes.)
> Others will need solutions like PVFS, Lustre, GPFS in soft- or
> hardware when all nodes (of a parallel job) must access the same
> files and cannot run just in one node.
> Your SAN is only attached to the headnode right now?
Yes, it's only attached to the headnode, with a local filesystem (ext3
at a guess, not being at the office :( ).
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users