[GE users] All queues dropped because of overload or full

Alexandre Racine Alexandre.Racine at mhicc.org
Wed Dec 12 19:28:16 GMT 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Mmmm, you mean the scheduler tuning profile when installing? It is set to normal.

Q1- So what you are actually saying is that everything is fine, and that SGE is just saying that there is no more slots left?

Q2- Also, looking in TOP, I had some program that where in the state "'D' = uninterruptible sleep". Would this be related?


Thanks.




Alexandre Racine
Projets spéciaux
514-461-1300 poste 3304
alexandre.racine at mhicc.org



-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de]
Sent: Wed 2007-12-12 13:40
To: users at gridengine.sunsource.net
Subject: Re: [GE users] All queues dropped because of overload or full
 
Am 12.12.2007 um 19:05 schrieb Alexandre Racine:

> Yes, all slots where used, but I did not have that message while  
> doing another tests witch had like 100.000 jobs pending. Why this  
> time I have that error message?
> Here is the qstat -f....

Maybe the scheduler info wasn't turned on the last time? - Reuti

> $ qstat -f
> queuename                      qtype used/tot. load_avg  
> arch          states
> ---------------------------------------------------------------------- 
> ------
> all.q at PAPRIKA                  BIP   14/14     14.20    lx24-amd64
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 2
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 3
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 4
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 5
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 8
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 12
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 13
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 16
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:44     1 20
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:44     1 21
>     140 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:25:31     1 2
>     140 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:25:31     1 3
>     140 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:25:47     1 7
>     140 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:44:56     1 8
> ---------------------------------------------------------------------- 
> ------
> all.q at oregano.statgen.local    BIP   8/8       8.95     lx24-amd64
>     131 0.55500 All_RLS_Me asseling     r     12/11/2007  
> 08:52:54     1
>     132 0.55500 SIME_RLS_M asseling     r     12/11/2007  
> 08:53:25     1
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 7
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 11
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 15
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:44     1 19
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:44     1 23
>     140 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:25:32     1 5
> ---------------------------------------------------------------------- 
> ------
> all.q at wasabi01.statgen.local   BIP   8/8       8.12     lx24-amd64
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 6
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 10
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 14
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:43     1 18
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:44     1 22
>     139 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:23:44     1 24
>     140 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:25:31     1 4
>     140 0.55500 rls-pbat35 asseling     r     12/11/2007  
> 14:25:32     1 6
>
> ###################################################################### 
> ######
>  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS -  
> PENDING JOBS
> ###################################################################### 
> ######
>     140 0.55500 rls-pbat35 asseling     qw    12/11/2007  
> 14:25:31     1 9-24:1
>     141 0.55500 rls-pbat35 asseling     qw    12/11/2007  
> 14:31:11     1 1-17:8
>     142 0.55500 pprd-sw_sn asseling     qw    12/12/2007  
> 09:36:21     1 3
>
>
>
>
>
> Alexandre Racine
> Projets spéciaux
> 514-461-1300 poste 3304
> alexandre.racine at mhicc.org
>
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Wed 2007-12-12 11:36
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] All queues dropped because of overload or full
>
> Hi,
>
> Am 12.12.2007 um 16:36 schrieb Alexandre Racine:
>
>> Looking with "top", the processors works, there is a lot of memory
>> available... qhost seems alright... I don't see why I get this.
>> There is only mabe the mem field in the "qstat -j" that sounds
>> impossible. Or is this the total amount of memory that has been
>> used? (used and freed). The only references that I found in the
>> archives are from 2004... Thanks.
>
> what is `qstat -f`saying? What looks the queue configuration like?
>
> -- Reuti
>
>
>>
>> More details:
>>
>> $ qhost
>> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE
>> SWAPTO  SWAPUS
>> --------------------------------------------------------------------- 
>> -
>> ---------
>> global                  -               -     -       -
>> -       -       -
>> server1                 lx24-amd64     16 14.26   30.4G    2.6G
>> 1.9G     0.0
>> server2                 lx24-amd64      8  8.39   15.7G    6.8G
>> 1.9G     0.0
>> server3                 lx24-amd64      8  8.12   14.6G  577.7M
>> 2.0G     0.0
>>
>> $ qstat -j 139
>> [...]
>> script_file:                script.sh
>> job-array tasks:            1-24:1
>> usage    2:                 cpu=20:06:52, mem=3467.62004 GBs,
>> io=0.00000, vmem=60.770M, maxvmem=62.227M
>> usage    3:                 cpu=07:04:53, mem=1250.25426 GBs,
>> io=0.00000, vmem=62.016M, maxvmem=63.266M
>> usage    4:                 cpu=07:02:46, mem=1247.70159 GBs,
>> io=0.00000, vmem=62.156M, maxvmem=63.492M
>> usage    5:                 cpu=07:04:38, mem=1249.53348 GBs,
>> io=0.00000, vmem=62.008M, maxvmem=63.316M
>> usage    6:                 cpu=16:03:12, mem=2834.15624 GBs,
>> io=0.00000, vmem=62.113M, maxvmem=63.023M
>> usage    7:                 cpu=15:17:48, mem=2707.94392 GBs,
>> io=0.00000, vmem=62.156M, maxvmem=62.578M
>> usage    8:                 cpu=07:02:46, mem=1247.34336 GBs,
>> io=0.00000, vmem=62.148M, maxvmem=63.484M
>> usage   10:                 cpu=20:09:24, mem=3475.70453 GBs,
>> io=0.00000, vmem=60.832M, maxvmem=62.266M
>> usage   11:                 cpu=14:32:42, mem=2568.06738 GBs,
>> io=0.00000, vmem=62.016M, maxvmem=63.016M
>> usage   12:                 cpu=07:14:50, mem=1283.31948 GBs,
>> io=0.00000, vmem=62.156M, maxvmem=63.504M
>> usage   13:                 cpu=07:15:51, mem=1282.46496 GBs,
>> io=0.00000, vmem=62.012M, maxvmem=63.359M
>> usage   14:                 cpu=15:56:08, mem=2813.61103 GBs,
>> io=0.00000, vmem=62.125M, maxvmem=63.430M
>> usage   15:                 cpu=14:38:33, mem=2592.12483 GBs,
>> io=0.00000, vmem=62.156M, maxvmem=63.312M
>> usage   16:                 cpu=07:17:23, mem=1290.37961 GBs,
>> io=0.00000, vmem=62.137M, maxvmem=63.574M
>> usage   18:                 cpu=20:09:23, mem=3482.93681 GBs,
>> io=0.00000, vmem=60.832M, maxvmem=62.289M
>> usage   19:                 cpu=14:26:19, mem=2549.31135 GBs,
>> io=0.00000, vmem=62.016M, maxvmem=63.324M
>> usage   20:                 cpu=07:22:26, mem=1305.89071 GBs,
>> io=0.00000, vmem=62.160M, maxvmem=63.617M
>> usage   21:                 cpu=07:23:30, mem=1304.96487 GBs,
>> io=0.00000, vmem=62.004M, maxvmem=63.328M
>> usage   22:                 cpu=15:08:08, mem=2672.33798 GBs,
>> io=0.00000, vmem=62.117M, maxvmem=63.551M
>> usage   23:                 cpu=14:23:55, mem=2548.95621 GBs,
>> io=0.00000, vmem=62.148M, maxvmem=63.609M
>> usage   24:                 cpu=15:04:51, mem=2669.49002 GBs,
>> io=0.00000, vmem=62.246M, maxvmem=63.523M
>> scheduling info:            queue instance
>> "all.q at oregano.statgen.local" dropped because it is full
>>                             queue instance
>> "all.q at wasabi01.statgen.local" dropped because it is full
>>                             queue instance "all.q at PAPRIKA" dropped
>> because it is full
>>                             All queues dropped because of overload
>> or full
>>
>>
>>
>>
>> Alexandre Racine
>> Projets spéciaux
>> 514-461-1300 poste 3304
>> alexandre.racine at mhicc.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net





    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list