AW: AW: [GE users] non-advancing jobs in gridengine

joelandman landman at scalableinformatics.com
Thu Aug 27 18:42:43 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Problem solved

carsten wrote:
> Joe,
> 
> maybe you have "only" a problem with a single node you hit all the
> time you submit the job via SGE. Try to force your SGE job to use
> exactly the same nodes you use in your manual calls. Submit the job
> with a list of nodes and make this a hard option for the SGE. (or was
> this meant by "forced node by hand" ?)

I did this and it made no difference.

> 
> Have you checked your Infiniband switch and made a fabric cleaning?

Yes, forced a restart of the switch and the session manager.

> Is a submitted Pallas Benchmark running fine?
> 
> If the job runs fine with a host file, try to start it as if your
> OpenMPI has no SGE support implemented and start it via the PE start
> procedures you do with other MPI versions.

Tried other schedulers and had identical problems.

> 
> Should not be a problem, but have you checked that all MPI ranks run
> where they should? Years ago I had a problem with another MPI
> implementation, that it did not start the correct number of jobs on
> the nodes as given by the host file.

Yeah, checked that.

> 
> Don't know if it makes any difference, increase the value for
> "pending signals" and "max user process" to 268288  and add the
> "ulimit -a" to your ./run_script_SGE.bash script, to be sure. Qrsh
> might give different results (I think there was something mentioned
> in the link).

I had been thinking this through and tried a thought experiment.

1st: use latest Open-MPI 1.3.3 source
2nd: turn off *all* compilation optimization on Open-MPI (using gcc for 
the C compiler, and ifort from Intel 11.1 for the fortran compiler)
3rd: turn on SGE and TM support

Built OpenMPI 1.3.3 and installed it.

Rebuilt application under OpenMPI 1.3.3.

Ran by hand from cli to make sure it worked.  It did.

Submitted to the queue to see if it got stuck.  It did not.

As near as I can tell, SGE does the rlimit setting right (I dug into the 
source to look at it).  Oddly, so does OpenMPI for ORTE.  The issue (I 
surmised) was in the optimization (-O) flag used in the compilation. 
Turning debugging on to the maximum and disabling optimization may cost 
a little in terms of algorithm support within MPI, but this cost 
shouldn't be too high.

So, I'd suggest for the wiki (and I am going to write something up for 
Jeff at OpenMPI as well), it might be worth a quick discussion of this 
so others don't get caught in this.

Joe



> 
> Carsten
> 
> -----Ursprüngliche Nachricht----- Von: joelandman
> [mailto:landman at scalableinformatics.com] Gesendet: Montag, 24. August
> 2009 22:48 An: users at gridengine.sunsource.net Betreff: Re: AW: [GE
> users] non-advancing jobs in gridengine
> 
> joelandman wrote:
> 
>> It looks like
>> 
>> ulimit -s unlimited
>> 
>> in the very top of the SGE execd script helped here.
>> 
> 
> I spoke too soon.  Looks like it ran once, but not the way I wanted.
>  Restarted it correctly, and we get the same problem.  I can confirm
> 
> landman at scalable:~> qrsh ulimit -a core file size          (blocks,
> -c) unlimited data seg size           (kbytes, -d) unlimited file
> size               (blocks, -f) unlimited pending signals
> (-i) 71680 max locked memory       (kbytes, -l) unlimited max memory
> size         (kbytes, -m) unlimited open files
> (-n) 1024 pipe size            (512 bytes, -p) 8 POSIX message queues
> (bytes, -q) 819200 stack size              (kbytes, -s) unlimited cpu
> time               (seconds, -t) unlimited max user processes
> (-u) 71680 virtual memory          (kbytes, -v) unlimited file locks
> (-x) unlimited
> 
> so we aren't running out of limits.
> 
> If I let SGE select the hosts, and don't use a machinefile, the job 
> fails to advance.  If I force those by hand, the job works.
> 
> job gets submitted with
> 
> qsub -pe openmpi 128 -cwd ./run_script_SGE.bash
> 
> and
> 
> landman at scalable:~> qconf -sp openmpi pe_name            openmpi 
> slots              128 user_lists         NONE xuser_lists
> NONE start_proc_args    /bin/true stop_proc_args     /bin/true 
> allocation_rule    $fill_up control_slaves     TRUE job_is_first_task
> FALSE urgency_slots      min accounting_summary FALSE
> 
> 
> 
> 


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214601

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list