[GE users] Wrong job executing

Jeffrey Montesano jmontesano at aetheranetworks.com
Fri Mar 30 15:46:28 BST 2007


The source of the problem was found, and though it has little to do with
gridengine, the solution could be of use to gridengine users who find
themselves doing something similar.  The problem was the following:

The file being executed by qsub ($testcase.$seed) first does a
compilation into library X, and then runs a simulation based on the
contents of library X.  What was happening was that when two jobs were
being scheduled to execute within a short time of one another, job A
would do its compilation of X, and then job B would recompile X before
job A's simulation would run.  This would result in both simulation A
and simulation B running based on the same compiled version of X, and so
it would make same job run twice. 

The solution was to modify the PERL script $testcase.$seed to implement
a simple semaphore (a file called run_sim_lock), to prevent one job from
recompiling the library until the other job's simulation has started:

     while (-e "./run_sim_lock") {
       system("sleep 5");
       print "waiting for a lock\n";
     }

     print "lock is available\n"; 
     system("touch ./run_sim_lock");

     print "creating new lock\n"; 

     # the previous invocation of run_sim time to 
     # launch the vsim and make use of the compiled library
     system("sleep 10");

    # compile the library
    @vlogcmds = ('vlog', '-f', 'test.f');

   # if there's a problem compiling the library then release the
semaphore, to allow other jobs to grab it
   if (system(@vlogcmds) != 0) {
      print "removing lock\n"; 
      system("rm -rf ./run_sim_lock");
      die "vlog commands args failed: $?";
   }

     # remove the semaphore
     print "removing lock\n"; 
     system("rm -rf ./run_sim_lock");
   }

   # launch the simulation
   system(@vsimcmds) == 0
      or die "vsim command failed: $?";


Thank you very much for your input on this matter, and though your
suggestions were not the cause, they definitely helped me to narrow my
focus and find the cause.

-Jeff


-----Original Message-----
From: Olesen, Mark [mailto:Mark.Olesen at arvinmeritor.com] 
Sent: Wednesday, March 28, 2007 5:32 AM
To: 'users at gridengine.sunsource.net'
Subject: RE: [GE users] Wrong job executing

Hi Jeffrey,

> > I'm launching the jobs from a PERL script as follows:
> >
> > while (<tclist>) {
> >   system("qsub -p -500 -q $queue -r yes -o regression_output -e
> > regression_output -t 1 -l qls=1 -cwd $testcase.$seed");
> > } # while
> >
> > File "testcase.$seed" defines some environment variables. Perhaps
> the
> > PERL "system" function is at fault?

I would look elsewhere for the problem, but for peace of mind you should
note the following from perlfunc(1):

    If there are no shell metacharacters in the argument, it is
    split into words and passed directly to "execvp", which is more
    efficient.


Thus, the system call inherits the parent env from Perl and can only be
influenced by manipulating %ENV, which probably wouldn't matter anyhow
(see
next comment).

> Worst case, you can change %ENV to reflect your needed environment.
> If
> $testcase.$seed sets up your environment, and is a bash script, you
> could do something like this:

This probably won't give you what you want. The call to qsub itself
*will*
have a different environment, but the script which to be run 
"$testcase.$seed" doesn't care about this (unless you've specified the
"-V"
option). 

> I failed to mention that my environment is a
> bit more complicated than what I had suggested though.  The file
> $testcase.$seed actually executes yet another script (ex: run_sim
> test77), and it is run_sim script that creates the environment
> variables
> based on the parameter test77.

I would check if any of your scripts are inadvertently being overwritten
by
other scripts somewhere. Perhaps use 'touch' first and check the date
stamps
later. 

Actually, to prevent this sort of thing, and since I really hate having
scripts hanging about, we usually generate the job scripts in-place.
With this method, you should also be able to merge both scripts
together.

Here is a brief outline of how to inline shell from within Perl.

At the top of the Perl script, define all of the configuration
information
as a hash: uppercase for things to be used within the script, minus
prefixed
for internal usage, or qsub options (eg, '-pe') 

In this example, I also copy in things from the ENV and provide default
values if they are unset. These can be replaced directly within the
script
to preclude any uncertainties about the runtime environment settings.

my %config = (
    ## env variables to use in script
    USER    => scalar getpwuid $>,           # provide 'USER' env
    STARDIR => $ENV{STARDIR} || "unknown",
    STARINI => $ENV{STARINI} || "Default",

    # logging within script - '$case' is a shell variable
    LOGGING => '>> $case.log',

    -queue => {
        -default => "cfd",                   # default queue(s)
    },

    -pe => {
        lam      => "lam",
        mpich    => "mpich",
        openmpi  => "openmpi",
        -default => "mpich",    ## default
    },
);


# We need at least a few options for our script

use Getopt::Std qw( getopts );

my ( %opt, %qsub );    # standard & qsub options
getopts( "ho:q:v", \%opt ) or usage();
usage() if $opt{h};

# example 1: use optional queue or revert to default queue
$opt{q} ||= ${config{-queue}{-default};

# example 2: alternative output
$opt{o} ||= "regression_output";

Later, after we've adjusted and pre-calculated everything needed, we can
get
ready to use qsub. We'll define the qsub call as follows:

my @qsub = qw( qsub -p -500 -t 1 -l qls=1 );
push @qsub, ( -q => $opt{q}, -o => $opt{o} );

## finally add in command arguments
push @qsub, "--", @ARGV if @ARGV;


It's a matter of taste how many qsub options go on the command-line and
how
many in the script.


Read in the shell script or shell script template, which is attached in
the
__DATA__ section of the Perl script: 

my $template = do { local $/; <DATA> };    # slurp in the shell script


For your case, we'll do this within the while loop, but I don't know
exactly
else you need to set or calculate:

{
  local *TCLIST;
  open TCLIST, ... or die " ... ";

while (<TCLIST>) {
   # define content for shell script - see __DATA__
   my %var = map { $_ => $config{$_} } grep { /^[_A-Z]+$/ } keys
%config;

   # I'll use SEED for your case, but you might need more
   $var{SEED} = $seed;


   # build shell script -  simple substitutions of %{...} constructs

   ( my $shell = $template )
      =~ s/%{\s*([_A-Z]+)\s*}/ $var{$1} || '' /eg;

   # either '-v' verify, or submit the job to qsub
   # open a pipe to qsub and feed it the created shell script

   if ( $opt{v} ) {
     warn "#!/bin/sh\n$shell\n";
   }
   else {
     local *STDOUT;
     open STDOUT, "|-", @qsub or warn "cannot open command\n";
     $|++;                     # unbuffered
     print $shell;
   }
 }

}

Note that since we used 'local', the pipe will be closed automatically
when
leaving the scope.

The shell script (or template) follows. To show off a bit, we'll also
have
the job script call itself and use embedded Perl at runtime.

# -------------------------------------------- end-of-perl-program
__DATA__
#
# <gridengine>
# =======================
# standard start options, standardized job-name/output etc
#$ -S /bin/sh -cwd -j y
# =======================
# </gridengine>

echo "this is my simple job script with seed=%{SEED}"
echo "this is our environment"
perl -wx $0 $@
echo "this is now the end"

exit 0
#####################################################################
# end of shell script
# ---------------------
# start of perl script (found by perl -x)
####################################################################
#!/usr/bin/perl -w

use strict;
use Data::Dumper;

print "ENV for SEED=%{SEED}\n";

print Dumper(\%ENV), "\n";

print "Closing comments:\n";
while (<DATA>) {
   print;
}

__END__

Perhaps this approach might work for you ... without having to embed a
shell
script within the embedded Perl script ;-)


/mark

This e-mail message and any attachments may contain legally privileged,
confidential or proprietary Information, or information otherwise
protected by law of ArvinMeritor, Inc., its affiliates, or third
parties. This notice serves as marking of its "Confidential" status as
defined in any confidentiality agreements concerning the sender and
recipient. If you are not the intended recipient(s), or the employee or
agent responsible for delivery of this message to the intended
recipient(s), you are hereby notified that any dissemination,
distribution or copying of this e-mail message is strictly prohibited.
If you have received this message in error, please immediately notify
the sender and delete this e-mail message from your computer.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list