[GE users] manage NFS resources

reuti reuti at staff.uni-marburg.de
Wed Sep 9 13:08:13 BST 2009

Am 09.09.2009 um 13:27 schrieb murple:

>> how do you submit the jobs? Many qsubs or one array job?
>> One option might be, to submit the jobs with a hold, and then the
>> first job releases the hold of another job after the input file was
>> read.
> That looks like an interesting solution. One would need to do some
> scripting to communicate the job id's. But still the problem of other
> jobs using up the bandwidth would persist.
>> [gnubatch]
> This looks interesting but is not feasible in our case. gnubatch is
> single host only and we have 100+ nodes of 2 types.
>> So it's a completely different concept,
>> than the resources in SGE or Torque, which will be allocated when the
>> job starts, and given back only at the end of the jobs.
> I was hoping that there is a solution to "give back" resources before
> the job ends. Maybe you could put that on a wishlist? Or would that
> cause other problems?

As it's also on my wishlist, I put it on the [GE dev] mailing list.  
But I or you could also enter an RFE for it.

> I'm planning to double the bandwidth of the fileserver using network
> bonding/trunking/aggregation/whatever. But then the filesystem speed
> will probably become a bottleneck.
> It would be nice if I could measure actually used bandwidth and could
> request a job only to be startet if used_bandwidth < threshold. But I

Do you know inside the job, when the reading of the file finished?  
There is the feature in SGE to attach meta-data to each job, called  
job-context. This way you could submit the job with -ac STEP=READING  
and set it during execution to -sc STEP=EXEC. Having a load sensor on  
the qmaster or file-server machine could a) build a list of running  
jobs with 'qstat -s r -u "*" -xml', then b) use a loop across these  
jobnumbers and count the enrtries 'conext:' in 'qstst -j <jobid>'  
which read STEP=READING. The load sensor could then be used to  
trigger a load threshold, when already one-two-three-whatever jobs  
are in the reading phase.

-- Reuti

> have no idea how to accomplish this.
> regards, Andreas
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=216555
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list