[GE users] Wrong job executing

Joe Landman landman at scalableinformatics.com
Tue Mar 27 16:06:40 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

ahhhh....

Jeffrey Montesano wrote:
> Thanks for your input Joe.  I failed to mention that my environment is a
> bit more complicated than what I had suggested though.  The file
> $testcase.$seed actually executes yet another script (ex: run_sim
> test77), and it is run_sim script that creates the environment variables
> based on the parameter test77.

One thing I do is include a --dryrun switch in all my jobs, where it 
turns on debugging, echos all environment, and doesn't actually run 
(just creates all the files for me).  Makes debugging much easier.

I have to manually code the dryrun into my systems, but as indicated, 
this can be a good thing.  If you are using perl to drive the 
computation (e.g. the other scripts are perl as well), then this should 
be easy.  If not, then try to make a version of your script which has 
the computation commented out.

BTW:  since you are using system, it will try to pull the environment 
from your current UID.  So if you are running this as a restricted user 
  or in a restricted environment, it might not work as expected.  Try 
replacing the qsub ... with /usr/bin/env  and see what you get.

Joe

> 
> -----Original Message-----
> From: Joe Landman [mailto:landman at scalableinformatics.com] 
> Sent: Tuesday, March 27, 2007 10:08 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Wrong job executing
> 
> Greetings Jeffrey
> 
> Jeffrey Montesano wrote:
>> I'm launching the jobs from a PERL script as follows:
>>
>> while (<tclist>) {
>>   system("qsub -p -500 -q $queue -r yes -o regression_output -e
>> regression_output -t 1 -l qls=1 -cwd $testcase.$seed");
>> } # while
>>
>> File "testcase.$seed" defines some environment variables. Perhaps the
>> PERL "system" function is at fault?
> 
> Worst case, you can change %ENV to reflect your needed environment.  If 
> $testcase.$seed sets up your environment, and is a bash script, you 
> could do something like this:
> 
> # in the beginning of the program
> my ($fh,$env_line);
> ...
> 
> # in the loop
> open($fh,"<".$testcase.$seed) or die "FATAL ERROR: unable to open 
> ".$testcase.$seed."\n";
> while($env_line=<$fh>)
>   {
>    $env_line=~ /(\S+)\s{0,}=\s{0,}(\S+)/;  # parse 'export VARIABLE=...'
> 					  # lines
>    $ENV{$1}=$2;				  # and stuff them into our
> 					  # environment
>   }
> close($fh);
> 
> Note that this is generally a bad idea from a security perspective, but 
> we are using regexes, so it should be un-tainted.
> 
> JOe
>>
>> -----Original Message-----
>> From: Andreas.Haas at Sun.COM [mailto:Andreas.Haas at Sun.COM] 
>> Sent: Tuesday, March 27, 2007 9:27 AM
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] Wrong job executing
>>
>> As a matter of course each job gets it's own environment.
>> Could it be that the mechanism used by your jobs is causing 
>> such a environment sharing?
>>
>> Andreas
>>
>>
>> On Tue, 27 Mar 2007, Jeffrey Montesano wrote:
>>
>>> What is the expected behavior for jobs that define their own
>> environment
>>> variables in the absence of the -V switch?  Does each job get its own
>>> shell in which its variables are shielded from other jobs?  Or do the
>>> jobs share the same shell, in which case there is the potential for
>>> one's environment variables to conflict with another's?
>>>
>>> -----Original Message-----
>>> From: Jeffrey Montesano [mailto:jmontesano at aetheranetworks.com]
>>> Sent: Friday, March 23, 2007 9:16 AM
>>> To: users at gridengine.sunsource.net
>>> Subject: RE: [GE users] Wrong job executing
>>>
>>> Answer to 1: logfile means the output created by the application, not
>> by
>>> SGE.
>>>
>>> Answer to 2: I'm not using a unique directory every time I submit a
>> job;
>>> I just want all of the jobs to run in the same directory as they are
>>> launched in - so maybe the -cwd is not necessary?
>>>
>>> I'm running on Linux RHEL4.3, SGE version 6.0u9.
>>>
>>> After doing some debugging of my own I have come to the realization
>> that
>>> my problem is related to environment variables.  What seems to be
>>> happening is that when two jobs are dispatched to be executed within
> a
>>> very short interval, one of the jobs ends up using the environment
>>> variables from the other job.  For example, job A defines some
>>> environment variable X=foo, and job B defines the same environment
>>> variable X=bar, when these two jobs are scheduled within a short
>>> interval of one another there is the possibility that job B will use
>>> X=foo instead of X=bar.
>>>
>>> Has anyone seen anything like this before?
>>>
>>> -----Original Message-----
>>> From: Rayson Ho [mailto:rayrayson at gmail.com]
>>> Sent: Wednesday, March 21, 2007 12:30 PM
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] Wrong job executing
>>>
>>> 1) By "logfile", you mean the job output file created by SGE or the
>>> application??
>>>
>>> 2) Since you use "-cwd", did you go to a unique directory every time
>>> you submit a job??
>>>
>>> BTW, what OS and SGE version are you running??
>>>
>>> Rayson
>>>
>>>
>>>
>>> On 3/21/07, Jeffrey Montesano <jmontesano at aetheranetworks.com> wrote:
>>>> No I didn't used the -b switch.  Here is the qsub command I used:
>>>>
>>>> qsub -p -500 -q $queue -r yes -o regression_output -e
>>> regression_output
>>>> -t 1 -l qls=1 -V -cwd  $testcase.$seed
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>>> Sent: Wednesday, March 21, 2007 11:45 AM
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] Wrong job executing
>>>>
>>>> Hi,
>>>>
>>>> Am 21.03.2007 um 15:32 schrieb Jeffrey Montesano:
>>>>
>>>>> To launch a regression, we submit several jobs (more than 10)
>>>>> during the day to a queue which is open from 8pm until 7am.  These
>>>>> jobs remain in the "qw" state until 8pm, at which time they all
>>>>> compete for the 4 available CPU slots.
>>>>>
>>>>>
>>>>>
>>>>> When the regression results are verified the next day we notice
>>>>> that some jobs have executed twice, while others have not executed
>>>>> at all.  For example, if the jobs launched were A, B, C, D, E,  we
>>>>> notice that there are logfiles created for A, B, C, D, E, but the
>>>>> contents of logfiles A and C both correspond to job A.  It's as if
>>>>> job C was executed as job A.
>>>> did you submit the job with the option "-b y" by accident and edit
>>>> the same script to submit it five times (hence only the last version
>>>> of the script would be executed five times)? What were your exact
>>>> qsub options and output redirections e.g. by a -o/-e option.
>>>>
>>>> -- Reuti
>>>>
>>>>
> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>> http://gridengine.info/
>>
>> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
>> Kirchheim-Heimstetten
>> Amtsgericht Muenchen: HRB 161028
>> Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland
> Boemer
>> Vorsitzender des Aufsichtsrates: Martin Haering
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list