AW: [GE users] non-advancing jobs in gridengine

dougalb dougal.lists at gmail.com
Tue Aug 25 11:47:14 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

We also had some issues when running MPI jobs. This was resolved with
the following setting in the SGE conf

$ qconf -sconf
.
.
execd_params                 H_MEMORYLOCKED=48g
.
.
$

Restart your execd deamons afterwards. We preferred this method, as it
requires no modification to the (S)GE scripts. Successfully running
OpenMPI jobs up to 512 cores.

-Dougal


On Mon, Aug 24, 2009 at 10:48 PM,
joelandman<landman at scalableinformatics.com> wrote:
> joelandman wrote:
>
>> It looks like
>>
>>       ulimit -s unlimited
>>
>> in the very top of the SGE execd script helped here.
>>
>
> I spoke too soon.  Looks like it ran once, but not the way I wanted.
> Restarted it correctly, and we get the same problem.  I can confirm
>
> landman at scalable:~> qrsh ulimit -a
> core file size          (blocks, -c) unlimited
> data seg size           (kbytes, -d) unlimited
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 71680
> max locked memory       (kbytes, -l) unlimited
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1024
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> stack size              (kbytes, -s) unlimited
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 71680
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
>
> so we aren't running out of limits.
>
> If I let SGE select the hosts, and don't use a machinefile, the job
> fails to advance.  If I force those by hand, the job works.
>
> job gets submitted with
>
>        qsub -pe openmpi 128 -cwd ./run_script_SGE.bash
>
> and
>
> landman at scalable:~> qconf -sp openmpi
> pe_name            openmpi
> slots              128
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    /bin/true
> stop_proc_args     /bin/true
> allocation_rule    $fill_up
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary FALSE
>
>
>
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics Inc.
> email: landman at scalableinformatics.com
> web  : http://scalableinformatics.com
>        http://scalableinformatics.com/jackrabbit
> phone: +1 734 786 8423 x121
> fax  : +1 866 888 3112
> cell : +1 734 612 4615
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214040
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214162

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list