AW: [GE users] non-advancing jobs in gridengine

joelandman landman at
Mon Aug 24 21:48:10 BST 2009

joelandman wrote:

> It looks like
> 	ulimit -s unlimited
> in the very top of the SGE execd script helped here.

I spoke too soon.  Looks like it ran once, but not the way I wanted. 
Restarted it correctly, and we get the same problem.  I can confirm

landman at scalable:~> qrsh ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 71680
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 71680
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

so we aren't running out of limits.

If I let SGE select the hosts, and don't use a machinefile, the job 
fails to advance.  If I force those by hand, the job works.

job gets submitted with

	qsub -pe openmpi 128 -cwd ./run_script_SGE.bash


landman at scalable:~> qconf -sp openmpi
pe_name            openmpi
slots              128
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $fill_up
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at
web  :
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615


To unsubscribe from this discussion, e-mail: [users-unsubscribe at].

More information about the gridengine-users mailing list