[GE users] Wrong job executing

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Tue Mar 27 14:49:16 BST 2007


On Tue, 27 Mar 2007, Jeffrey Montesano wrote:

> I'm launching the jobs from a PERL script as follows:
>
> while (<tclist>) {
>  system("qsub -p -500 -q $queue -r yes -o regression_output -e
> regression_output -t 1 -l qls=1 -cwd $testcase.$seed");
> } # while
>
> File "testcase.$seed" defines some environment variables. Perhaps the
> PERL "system" function is at fault?

I would rule this out. system() almost surely fork()s and fork causes
the process memory be duplicated including the environemnt.

IMO speculating won't bring you further. You should deliberate what means 
exist to trace into how environment is set up. E.g. you could try to add

    cat $SGE_JOB_SPOOL_DIR/environment

at the begin of these job scripts. That way you could control what
environemnt settings are being made by Grid Engine before the job
actually was launched.

Andreas

> -----Original Message-----
> From: Andreas.Haas at Sun.COM [mailto:Andreas.Haas at Sun.COM]
> Sent: Tuesday, March 27, 2007 9:27 AM
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] Wrong job executing
>
> As a matter of course each job gets it's own environment.
> Could it be that the mechanism used by your jobs is causing
> such a environment sharing?
>
> Andreas
>
>
> On Tue, 27 Mar 2007, Jeffrey Montesano wrote:
>
>> What is the expected behavior for jobs that define their own
> environment
>> variables in the absence of the -V switch?  Does each job get its own
>> shell in which its variables are shielded from other jobs?  Or do the
>> jobs share the same shell, in which case there is the potential for
>> one's environment variables to conflict with another's?
>>
>> -----Original Message-----
>> From: Jeffrey Montesano [mailto:jmontesano at aetheranetworks.com]
>> Sent: Friday, March 23, 2007 9:16 AM
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] Wrong job executing
>>
>> Answer to 1: logfile means the output created by the application, not
> by
>> SGE.
>>
>> Answer to 2: I'm not using a unique directory every time I submit a
> job;
>> I just want all of the jobs to run in the same directory as they are
>> launched in - so maybe the -cwd is not necessary?
>>
>> I'm running on Linux RHEL4.3, SGE version 6.0u9.
>>
>> After doing some debugging of my own I have come to the realization
> that
>> my problem is related to environment variables.  What seems to be
>> happening is that when two jobs are dispatched to be executed within a
>> very short interval, one of the jobs ends up using the environment
>> variables from the other job.  For example, job A defines some
>> environment variable X=foo, and job B defines the same environment
>> variable X=bar, when these two jobs are scheduled within a short
>> interval of one another there is the possibility that job B will use
>> X=foo instead of X=bar.
>>
>> Has anyone seen anything like this before?
>>
>> -----Original Message-----
>> From: Rayson Ho [mailto:rayrayson at gmail.com]
>> Sent: Wednesday, March 21, 2007 12:30 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Wrong job executing
>>
>> 1) By "logfile", you mean the job output file created by SGE or the
>> application??
>>
>> 2) Since you use "-cwd", did you go to a unique directory every time
>> you submit a job??
>>
>> BTW, what OS and SGE version are you running??
>>
>> Rayson
>>
>>
>>
>> On 3/21/07, Jeffrey Montesano <jmontesano at aetheranetworks.com> wrote:
>>> No I didn't used the -b switch.  Here is the qsub command I used:
>>>
>>> qsub -p -500 -q $queue -r yes -o regression_output -e
>> regression_output
>>> -t 1 -l qls=1 -V -cwd  $testcase.$seed
>>>
>>>
>>> -----Original Message-----
>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>> Sent: Wednesday, March 21, 2007 11:45 AM
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] Wrong job executing
>>>
>>> Hi,
>>>
>>> Am 21.03.2007 um 15:32 schrieb Jeffrey Montesano:
>>>
>>>> To launch a regression, we submit several jobs (more than 10)
>>>> during the day to a queue which is open from 8pm until 7am.  These
>>>> jobs remain in the "qw" state until 8pm, at which time they all
>>>> compete for the 4 available CPU slots.
>>>>
>>>>
>>>>
>>>> When the regression results are verified the next day we notice
>>>> that some jobs have executed twice, while others have not executed
>>>> at all.  For example, if the jobs launched were A, B, C, D, E,  we
>>>> notice that there are logfiles created for A, B, C, D, E, but the
>>>> contents of logfiles A and C both correspond to job A.  It's as if
>>>> job C was executed as job A.
>>> did you submit the job with the option "-b y" by accident and edit
>>> the same script to submit it five times (hence only the last version
>>> of the script would be executed five times)? What were your exact
>>> qsub options and output redirections e.g. by a -o/-e option.
>>>
>>> -- Reuti
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> http://gridengine.info/
>
> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
> Kirchheim-Heimstetten
> Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
> Vorsitzender des Aufsichtsrates: Martin Haering
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

http://gridengine.info/

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list