[GE users] File copy from local /scratch on job termination

Olesen, Mark Mark.Olesen at emcontechnologies.com
Mon Dec 15 14:22:59 GMT 2008


Hi Reuti,

Do you know if -notify now works with openmpi? There used to be a
problem of USR1/USR2 killing the daemons.

/mark

> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Monday, December 15, 2008 3:20 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] File copy from local /scratch on job
> termination
> 
> Hi,
> 
> Am 15.12.2008 um 15:09 schrieb Bart Willems:
> 
> > I recently urged our cluster users to use local scratch space on the
> > cluster nodes instead of the NFS mounted RAID during their
> > calculations.
> > In the example submission file below all required files for the job
> > are
> > copied over to the node's local hard disk ($TMPDIR is /scratch) and
> > copied
> > back when the job completes. However, the files only get copied
> > back when
> > the job exits normally. If SGE terminates the job because it
> > exceeds the
> > requested CPU time or if a user manually terminates a job with
> > qdel, the
> > files are not copied back from the node's local hard disk to the
> > RAID. Is
> > there any way around this?
> 
> yes.
> 
> > Thanks,
> > Bart
> >
> 
> You have to submit the job with -notify
> 
> 
> > #!/bin/bash
> >
> > #$ -S /bin/bash
> > #$ -j y
> > #$ -N helloworld_test
> > #$ -l h_cpu=00:02:00
> > #$ -cwd
> 
> # Two single quotes
> trap '' usr1 usr2
> 
> 
> > # Copy job files to local scratch space
> > JOBFILE=jobfiles.job-id-$JOB_ID.tgz
> > tar cfz $JOBFILE helloworld
> > cp $JOBFILE $TMPDIR
> > rm -rf $JOBFILE
> > cd $TMPDIR
> > tar xfz $JOBFILE
> > rm -rf $JOBFILE
> 
> Maybe you can avoid the local file:
> 
> tar cj helloworld | tar xj -C $TMPDIR
> cd $TMPDIR
> 
> > # Computational command to run
> > ./helloworld
> 
> replace with:
> 
> (trap - usr1 usr2; exec ./helloworld)
> 
> 
> > # Copy all files back.
> > OUTFILE=outfiles.job-id-$JOB_ID.tgz
> > tar cfz $OUTFILE *
> > cp $OUTFILE $SGE_CWD_PATH
> > cd $SGE_CWD_PATH
> > tar xfz $OUTFILE
> > rm -rf $OUTFILE
> 
> tar -cj $OUTFILE * | tar xj -C $SGE_CWD_PATH
> 
> 
> HTH - Reuti
> 
> 
> > ------------------------------------------------------
> > http://gridengine.sunsource.net/ds/viewMessage.do?
> > dsForumId=38&dsMessageId=92671
> >
> > To unsubscribe from this discussion, e-mail: [users-
> > unsubscribe at gridengine.sunsource.net].
> >
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
> geId=92675
> 
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].
This e-mail message and any attachments may contain 
legally privileged, confidential or proprietary Information, 
or information otherwise protected by law of EMCON 
Technologies, its affiliates, or third parties. This notice 
serves as marking of its "Confidential" status as defined 
in any confidentiality agreements concerning the sender 
and recipient. If you are not the intended recipient(s), 
or the employee or agent responsible for delivery of this 
message to the intended recipient(s), you are hereby 
notified that any dissemination, distribution or copying 
of this e-mail message is strictly prohibited. 
If you have received this message in error, please 
immediately notify the sender and delete this e-mail 
message from your computer.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92677

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list