[GE users] File copy from local /scratch on job termination

Olesen, Mark Mark.Olesen at emcontechnologies.com
Mon Dec 15 15:58:57 GMT 2008


> > Do you know if -notify now works with openmpi? There used to be a
> > problem of USR1/USR2 killing the daemons.
> 
> Dunno, but I can test it. - Reuti

Please do when you have a chance. I assume that you are still active on
the openmpi list and can also let them know of the status.

/mark

> >
> >> -----Original Message-----
> >> From: reuti [mailto:reuti at staff.uni-marburg.de]
> >> Sent: Monday, December 15, 2008 3:20 PM
> >> To: users at gridengine.sunsource.net
> >> Subject: Re: [GE users] File copy from local /scratch on job
> >> termination
> >>
> >> Hi,
> >>
> >> Am 15.12.2008 um 15:09 schrieb Bart Willems:
> >>
> >>> I recently urged our cluster users to use local scratch space on
> the
> >>> cluster nodes instead of the NFS mounted RAID during their
> >>> calculations.
> >>> In the example submission file below all required files for the
> job
> >>> are
> >>> copied over to the node's local hard disk ($TMPDIR is /scratch)
> and
> >>> copied
> >>> back when the job completes. However, the files only get copied
> >>> back when
> >>> the job exits normally. If SGE terminates the job because it
> >>> exceeds the
> >>> requested CPU time or if a user manually terminates a job with
> >>> qdel, the
> >>> files are not copied back from the node's local hard disk to the
> >>> RAID. Is
> >>> there any way around this?
> >>
> >> yes.
> >>
> >>> Thanks,
> >>> Bart
> >>>
> >>
> >> You have to submit the job with -notify
> >>
> >>
> >>> #!/bin/bash
> >>>
> >>> #$ -S /bin/bash
> >>> #$ -j y
> >>> #$ -N helloworld_test
> >>> #$ -l h_cpu=00:02:00
> >>> #$ -cwd
> >>
> >> # Two single quotes
> >> trap '' usr1 usr2
> >>
> >>
> >>> # Copy job files to local scratch space
> >>> JOBFILE=jobfiles.job-id-$JOB_ID.tgz
> >>> tar cfz $JOBFILE helloworld
> >>> cp $JOBFILE $TMPDIR
> >>> rm -rf $JOBFILE
> >>> cd $TMPDIR
> >>> tar xfz $JOBFILE
> >>> rm -rf $JOBFILE
> >>
> >> Maybe you can avoid the local file:
> >>
> >> tar cj helloworld | tar xj -C $TMPDIR
> >> cd $TMPDIR
> >>
> >>> # Computational command to run
> >>> ./helloworld
> >>
> >> replace with:
> >>
> >> (trap - usr1 usr2; exec ./helloworld)
> >>
> >>
> >>> # Copy all files back.
> >>> OUTFILE=outfiles.job-id-$JOB_ID.tgz
> >>> tar cfz $OUTFILE *
> >>> cp $OUTFILE $SGE_CWD_PATH
> >>> cd $SGE_CWD_PATH
> >>> tar xfz $OUTFILE
> >>> rm -rf $OUTFILE
> >>
> >> tar -cj $OUTFILE * | tar xj -C $SGE_CWD_PATH
> >>
> >>
> >> HTH - Reuti
This e-mail message and any attachments may contain 
legally privileged, confidential or proprietary Information, 
or information otherwise protected by law of EMCON 
Technologies, its affiliates, or third parties. This notice 
serves as marking of its "Confidential" status as defined 
in any confidentiality agreements concerning the sender 
and recipient. If you are not the intended recipient(s), 
or the employee or agent responsible for delivery of this 
message to the intended recipient(s), you are hereby 
notified that any dissemination, distribution or copying 
of this e-mail message is strictly prohibited. 
If you have received this message in error, please 
immediately notify the sender and delete this e-mail 
message from your computer.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92695

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list