[GE users] All queues dropped because of overload or full

Alexandre Racine Alexandre.Racine at mhicc.org
Wed Dec 12 18:05:24 GMT 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Yes, all slots where used, but I did not have that message while doing another tests witch had like 100.000 jobs pending. Why this time I have that error message?
Here is the qstat -f....

$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q at PAPRIKA                  BIP   14/14     14.20    lx24-amd64
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 2
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 3
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 4
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 5
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 8
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 12
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 13
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 16
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:44     1 20
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:44     1 21
    140 0.55500 rls-pbat35 asseling     r     12/11/2007 14:25:31     1 2
    140 0.55500 rls-pbat35 asseling     r     12/11/2007 14:25:31     1 3
    140 0.55500 rls-pbat35 asseling     r     12/11/2007 14:25:47     1 7
    140 0.55500 rls-pbat35 asseling     r     12/11/2007 14:44:56     1 8
----------------------------------------------------------------------------
all.q at oregano.statgen.local    BIP   8/8       8.95     lx24-amd64
    131 0.55500 All_RLS_Me asseling     r     12/11/2007 08:52:54     1
    132 0.55500 SIME_RLS_M asseling     r     12/11/2007 08:53:25     1
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 7
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 11
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 15
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:44     1 19
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:44     1 23
    140 0.55500 rls-pbat35 asseling     r     12/11/2007 14:25:32     1 5
----------------------------------------------------------------------------
all.q at wasabi01.statgen.local   BIP   8/8       8.12     lx24-amd64
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 6
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 10
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 14
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:43     1 18
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:44     1 22
    139 0.55500 rls-pbat35 asseling     r     12/11/2007 14:23:44     1 24
    140 0.55500 rls-pbat35 asseling     r     12/11/2007 14:25:31     1 4
    140 0.55500 rls-pbat35 asseling     r     12/11/2007 14:25:32     1 6

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    140 0.55500 rls-pbat35 asseling     qw    12/11/2007 14:25:31     1 9-24:1
    141 0.55500 rls-pbat35 asseling     qw    12/11/2007 14:31:11     1 1-17:8
    142 0.55500 pprd-sw_sn asseling     qw    12/12/2007 09:36:21     1 3





Alexandre Racine
Projets spéciaux
514-461-1300 poste 3304
alexandre.racine at mhicc.org



-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de]
Sent: Wed 2007-12-12 11:36
To: users at gridengine.sunsource.net
Subject: Re: [GE users] All queues dropped because of overload or full
 
Hi,

Am 12.12.2007 um 16:36 schrieb Alexandre Racine:

> Looking with "top", the processors works, there is a lot of memory  
> available... qhost seems alright... I don't see why I get this.  
> There is only mabe the mem field in the "qstat -j" that sounds  
> impossible. Or is this the total amount of memory that has been  
> used? (used and freed). The only references that I found in the  
> archives are from 2004... Thanks.

what is `qstat -f`saying? What looks the queue configuration like?

-- Reuti


>
> More details:
>
> $ qhost
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE   
> SWAPTO  SWAPUS
> ---------------------------------------------------------------------- 
> ---------
> global                  -               -     -       -        
> -       -       -
> server1                 lx24-amd64     16 14.26   30.4G    2.6G     
> 1.9G     0.0
> server2                 lx24-amd64      8  8.39   15.7G    6.8G     
> 1.9G     0.0
> server3                 lx24-amd64      8  8.12   14.6G  577.7M     
> 2.0G     0.0
>
> $ qstat -j 139
> [...]
> script_file:                script.sh
> job-array tasks:            1-24:1
> usage    2:                 cpu=20:06:52, mem=3467.62004 GBs,  
> io=0.00000, vmem=60.770M, maxvmem=62.227M
> usage    3:                 cpu=07:04:53, mem=1250.25426 GBs,  
> io=0.00000, vmem=62.016M, maxvmem=63.266M
> usage    4:                 cpu=07:02:46, mem=1247.70159 GBs,  
> io=0.00000, vmem=62.156M, maxvmem=63.492M
> usage    5:                 cpu=07:04:38, mem=1249.53348 GBs,  
> io=0.00000, vmem=62.008M, maxvmem=63.316M
> usage    6:                 cpu=16:03:12, mem=2834.15624 GBs,  
> io=0.00000, vmem=62.113M, maxvmem=63.023M
> usage    7:                 cpu=15:17:48, mem=2707.94392 GBs,  
> io=0.00000, vmem=62.156M, maxvmem=62.578M
> usage    8:                 cpu=07:02:46, mem=1247.34336 GBs,  
> io=0.00000, vmem=62.148M, maxvmem=63.484M
> usage   10:                 cpu=20:09:24, mem=3475.70453 GBs,  
> io=0.00000, vmem=60.832M, maxvmem=62.266M
> usage   11:                 cpu=14:32:42, mem=2568.06738 GBs,  
> io=0.00000, vmem=62.016M, maxvmem=63.016M
> usage   12:                 cpu=07:14:50, mem=1283.31948 GBs,  
> io=0.00000, vmem=62.156M, maxvmem=63.504M
> usage   13:                 cpu=07:15:51, mem=1282.46496 GBs,  
> io=0.00000, vmem=62.012M, maxvmem=63.359M
> usage   14:                 cpu=15:56:08, mem=2813.61103 GBs,  
> io=0.00000, vmem=62.125M, maxvmem=63.430M
> usage   15:                 cpu=14:38:33, mem=2592.12483 GBs,  
> io=0.00000, vmem=62.156M, maxvmem=63.312M
> usage   16:                 cpu=07:17:23, mem=1290.37961 GBs,  
> io=0.00000, vmem=62.137M, maxvmem=63.574M
> usage   18:                 cpu=20:09:23, mem=3482.93681 GBs,  
> io=0.00000, vmem=60.832M, maxvmem=62.289M
> usage   19:                 cpu=14:26:19, mem=2549.31135 GBs,  
> io=0.00000, vmem=62.016M, maxvmem=63.324M
> usage   20:                 cpu=07:22:26, mem=1305.89071 GBs,  
> io=0.00000, vmem=62.160M, maxvmem=63.617M
> usage   21:                 cpu=07:23:30, mem=1304.96487 GBs,  
> io=0.00000, vmem=62.004M, maxvmem=63.328M
> usage   22:                 cpu=15:08:08, mem=2672.33798 GBs,  
> io=0.00000, vmem=62.117M, maxvmem=63.551M
> usage   23:                 cpu=14:23:55, mem=2548.95621 GBs,  
> io=0.00000, vmem=62.148M, maxvmem=63.609M
> usage   24:                 cpu=15:04:51, mem=2669.49002 GBs,  
> io=0.00000, vmem=62.246M, maxvmem=63.523M
> scheduling info:            queue instance  
> "all.q at oregano.statgen.local" dropped because it is full
>                             queue instance  
> "all.q at wasabi01.statgen.local" dropped because it is full
>                             queue instance "all.q at PAPRIKA" dropped  
> because it is full
>                             All queues dropped because of overload  
> or full
>
>
>
>
> Alexandre Racine
> Projets spéciaux
> 514-461-1300 poste 3304
> alexandre.racine at mhicc.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net





    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list