[GE users] Wrong job executing

Jeffrey Montesano jmontesano at aetheranetworks.com
Fri Mar 23 13:16:11 GMT 2007


Answer to 1: logfile means the output created by the application, not by
SGE.

Answer to 2: I'm not using a unique directory every time I submit a job;
I just want all of the jobs to run in the same directory as they are
launched in - so maybe the -cwd is not necessary?

I'm running on Linux RHEL4.3, SGE version 6.0u9.  

After doing some debugging of my own I have come to the realization that
my problem is related to environment variables.  What seems to be
happening is that when two jobs are dispatched to be executed within a
very short interval, one of the jobs ends up using the environment
variables from the other job.  For example, job A defines some
environment variable X=foo, and job B defines the same environment
variable X=bar, when these two jobs are scheduled within a short
interval of one another there is the possibility that job B will use
X=foo instead of X=bar.

Has anyone seen anything like this before?  

-----Original Message-----
From: Rayson Ho [mailto:rayrayson at gmail.com] 
Sent: Wednesday, March 21, 2007 12:30 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Wrong job executing

1) By "logfile", you mean the job output file created by SGE or the
application??

2) Since you use "-cwd", did you go to a unique directory every time
you submit a job??

BTW, what OS and SGE version are you running??

Rayson



On 3/21/07, Jeffrey Montesano <jmontesano at aetheranetworks.com> wrote:
> No I didn't used the -b switch.  Here is the qsub command I used:
>
> qsub -p -500 -q $queue -r yes -o regression_output -e
regression_output
> -t 1 -l qls=1 -V -cwd  $testcase.$seed
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Wednesday, March 21, 2007 11:45 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Wrong job executing
>
> Hi,
>
> Am 21.03.2007 um 15:32 schrieb Jeffrey Montesano:
>
> > To launch a regression, we submit several jobs (more than 10)
> > during the day to a queue which is open from 8pm until 7am.  These
> > jobs remain in the "qw" state until 8pm, at which time they all
> > compete for the 4 available CPU slots.
> >
> >
> >
> > When the regression results are verified the next day we notice
> > that some jobs have executed twice, while others have not executed
> > at all.  For example, if the jobs launched were A, B, C, D, E,  we
> > notice that there are logfiles created for A, B, C, D, E, but the
> > contents of logfiles A and C both correspond to job A.  It's as if
> > job C was executed as job A.
> did you submit the job with the option "-b y" by accident and edit
> the same script to submit it five times (hence only the last version
> of the script would be executed five times)? What were your exact
> qsub options and output redirections e.g. by a -o/-e option.
>
> -- Reuti
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list