[GE users] Submit multiple jobs through SGE - UPDATE
reuti at staff.uni-marburg.de
Wed Feb 10 16:25:58 GMT 2010
Am 10.02.2010 um 04:58 schrieb santosh:
> Dear SGE Experts
> I noticed that batch jobs through SGE terminated prematurely.
how - what's the exact error message?
> However, the qsub jobs submitted manually with complete paths of
> all commands/files etc ran without any problems. Does it mean that
> the assigned values to different variables are not passed properly
> during a "qsub" job inspite of provide a "-v" or "-V" flag?
Maybe the paths for some stuff is different on the exechost. As we
don't know your script, we can't say.
> Would really appreciate your assistance.
> On Tue, Feb 9, 2010 at 4:44 PM, Santosh <santosh2005 at gmail.com> wrote:
> I am not sure if I sent to the right group..Hence a re-post.
> I am a newbie to SGE and also to the discussion forum. If I have
> posted this query to a wrong discussion group, please direct me to
> the appropriate one. I apologize for any inconvenience caused.
> I want to run about 100 jobs, that involve running 100 program
> files w/ associated data in their respective folders, on a small
> linux cluster in Bourne shell . A flat ascii (csv) file contains a
> list of folders, program files and data files to be used.
> At present there are 3 scripts (scA, scB and scC) :
> ScA reads the csv file to obtain the names of program files and
> associated data files; and then calls scB
> ScB copies requisite into the program and data files into a target
> folder, does additional environment setup before calling scC which
> compiles and executes a Fortran program by taking in the set of
> program file/data file. It is a single thread process as of now,
> run with a core of a CPU.
> The above 2 scripts work okay at Linux command line.
> While submitting the job through SGE, scA calls scB and all seem
> okay till the files are copied to respective target folder (by
> scB). When called by
I would be better to put all your steps in one jobscript, as you
could then copy all your stuff to the node into the $TMPDIR for this
job on the node, which will be ereased after the job for sure
> scB, scC tries to execute a nohup command to compile and execute
> the program, it says that job is successfully scheduled or
> submitted (as the case may be). However, I don't see corresponding
> intermediate files and in a redirected error file, I see that
> "./.ext" not found" (the "value of "$progfile" as seen below is
> missing )
Where is $progfile defined? It looks like also $FProg is not found
and hence the ./.ext treated as the to be executed application. scC
won't inherit anything from scA or scB.
> In scC, the command used is:
> nohup $FProg $progfile.ext $outfile.out>&"$progfile.txt"&
Inside a jobscript it should be avoided to put something in the
> The SGE command line used is as follows:
> qsub -v SGE_ROOT="/opt/ge62",PATH=$PATH:/home/$USER/utils:$SGE_ROOT/
> bin/$ARC/qsub,QALTER=$SGE_ROOT/bin/$ARC/qalter -cwd -b y -t 1-2 -
> now y -S /bin/sh /home/santosh/utils/scA
Most of the variables are not necessary. The only two exceptions I
see are $PATH and $LD_LIBRARY_PATH for your application. But instead
of putting these two in every qsub command, I would suggest to set
them just in the jobscript. If you define all in the jobscript, you
can be sure that any tampering of your local enviroment won't affect
the execution of your jobs. Otherwise it might happen that you change
something in your local shell and suddenly the jobs are now longer
working and you can't find the cause easily.
> A variation of the above like the one below does not work:
> qsub -V /home/santosh/utils/NPATH -cwd -b y -t 1-2 -now y -S /bin/
> sh /home/santosh/utils/scA
> All the called scripts begin with:
> #$ -S /bin/sh
> Could you please let me know what I am missing? Any assistance will
> be highly appreciated.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users