[GE users] Jobs being suspended incorrectly

reuti reuti at staff.uni-marburg.de
Sat Apr 17 13:38:25 BST 2010


Am 16.04.2010 um 23:23 schrieb opoplawski:

> On 04/07/2010 08:29 AM, reuti wrote:
>>
>> can you please post the queue definitions.
>
> Okay, here we go:
>
> $ qstat -u \* | grep apapane
>   16680 0.56000 run_cora.c dombroski    S     04/16/2010 15:01:34
> mpi at apapane.cora.nwra.com          8

Only the master slot is listed, AFAICS you got two times 4 slots:

$ qstat -g t

-- Reuti


>
> $ qstat -f | grep apapane
> admin.q at apapane.cora.nwra.com  BIPC  0/0/1          0.03     lx26- 
> amd64
> ivm.q at apapane.cora.nwra.com    BIPC  0/0/4          0.03     lx26- 
> amd64
> compute.q at apapane.cora.nwra.co BIPC  0/4/4          0.03     lx26- 
> amd64    S
> mpi at apapane.cora.nwra.com      PC    0/4/4          0.03     lx26- 
> amd64
>
> Why does compute.q at apapane show 4 slots used?
> Why is the job in S when it is in the mpi queue?
> Looks like a bug to me.
>
> Here are the queue definitions:
>
> [orion at orca trunk]$ qconf -sq ivm.q
> qname                 ivm.q
> hostlist              apapane.cora.nwra.com
> seq_no                0
> load_thresholds       np_load_avg=1
> suspend_thresholds    np_load_avg=1.05
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             simple
> pe_list               make mpi mpirr smp
> rerun                 FALSE
> slots                 4
> tmpdir                /tmp
> shell                 /bin/csh
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            alexand
> user_lists            ivm orion
> xuser_lists           NONE
> subordinate_list      compute.q=3, mpi=1
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
> [orion at orca trunk]$ qconf -sq compute.q
> qname                 compute.q
> hostlist              @compute
> seq_no                5
> load_thresholds       NONE,[@interactive=np_load_short=1]
> suspend_thresholds    NONE,[@interactive=np_load_short=1.05]
> nsuspend              1
> suspend_interval      00:03:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             simple
> pe_list               make mpi mpirr smp
> rerun                 TRUE
> slots                 4,[@dualproc=2],[@octproc=8]
> tmpdir                /tmp
> shell                 /bin/csh
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
> [orion at orca trunk]$ qconf -sq mpi
> qname                 mpi
> hostlist              @mpi
> seq_no                5,[@dualproc=9]
> load_thresholds       NONE,[@interactive=np_load_short=1]
> suspend_thresholds    NONE,[@interactive=np_load_short=1.05]
> nsuspend              1
> suspend_interval      00:03:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 NONE
> ckpt_list             simple
> pe_list               make mpi mpirr smp
> rerun                 TRUE
> slots                 4,[@dualproc=2],[@octproc=8]
> tmpdir                /tmp
> shell                 /bin/csh
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      all.q=1, compute.q=1
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
>
> -- 
> Orion Poplawski
> Technical Manager                     303-415-9701 x222
> NWRA/CoRA Division                    FAX: 303-415-9702
> 3380 Mitchell Lane                  orion at cora.nwra.com
> Boulder, CO 80301              http://www.cora.nwra.com
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253745
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253805

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list