[GE users] Submit multiple jobs through SGE - UPDATE

santosh santosh2005 at gmail.com
Wed Feb 10 03:58:29 GMT 2010

Dear SGE Experts

I noticed that batch jobs through SGE terminated prematurely. However, the qsub jobs submitted manually with complete paths of all commands/files etc ran without any problems. Does it mean that the assigned values to different variables are not passed properly during a "qsub" job inspite of provide a "-v" or "-V" flag?

Would really appreciate your assistance.


On Tue, Feb 9, 2010 at 4:44 PM, Santosh <santosh2005 at gmail.com<mailto:santosh2005 at gmail.com>> wrote:

I am not sure if I sent to the right group..Hence a re-post.

I am a newbie to SGE and also to the discussion forum. If I have posted this query to a wrong discussion group, please direct me to  the appropriate one. I apologize for any inconvenience caused.

I want to run about 100 jobs, that involve running 100 program files w/ associated data in their respective folders, on a small linux cluster in Bourne shell . A flat ascii (csv) file contains a list of folders, program files and data files to be used.

At present there are 3 scripts (scA, scB and scC) :
ScA reads the csv file to obtain the names of program files and associated data files; and then calls scB
ScB copies requisite into the program and data files into a target folder, does additional environment setup before calling scC which compiles and executes a Fortran program by taking in the set of program file/data file. It is a single thread process as of now, run with a core of a CPU.

The above 2 scripts work okay at Linux command line.

While submitting the job through SGE, scA calls scB and all seem okay till the files are copied to respective target folder (by scB). When called by scB, scC tries to execute a nohup command to compile and execute the program, it says that job is successfully scheduled or submitted (as the case may be). However, I don't see corresponding intermediate files and in a redirected error file, I see that
"./.ext" not found" (the "value of "$progfile" as seen below is missing )

In scC, the command used is:
nohup $FProg $progfile.ext $outfile.out>&"$progfile.txt"&

The SGE command line used is as follows:
qsub -v SGE_ROOT="/opt/ge62",PATH=$PATH:/home/$USER/utils:$SGE_ROOT/bin/lx24-amd64,LD_LIBRARY_PATH="/opt/cluster/gcc/lib64:/opt/cluster/gcc/lib:$LD_LIBRARY_PATH",ARC=$SGE_ROOT/util/arch,QSUB=$SGE_ROOT/bin/$ARC/qsub,QALTER=$SGE_ROOT/bin/$ARC/qalter -cwd -b y -t 1-2 -now y -S /bin/sh /home/santosh/utils/scA

A variation of the above like the one below does not work:
qsub -V /home/santosh/utils/NPATH -cwd -b y -t 1-2 -now y -S /bin/sh /home/santosh/utils/scA

All the called scripts begin with:
#$ -S /bin/sh

Could you please let me know what I am missing? Any assistance will be highly appreciated.


