[GE users] Script to show pending reasons in SGE

pto pto at linuxbog.dk
Fri May 29 20:52:48 BST 2009


On Tue, 26 May 2009, pto wrote:

It seems that a good start is this simple script:

#!/bin/bash

PENDING=`qstat -s p | awk '{if(NR>2) print $1}'`

if [ "$PENDING" != "" ]
then
  for i in $PENDING
  do
    REASON=`qstat -j $i | sed '$!d;s/^[ \t]*//'`
    echo "$i : $REASON"
  done
fi

I hope you will send me improvements and review comments...

Best

Peter

> Dear all
>
> One of the annoying things I see with SGE 6.x is the reasons I can see
> for pending jobs in an SGE queue.
> Assume that Job 307472 is not running, and I want to know why. Then I use qstat
>
> $ qstat -j 307472
> ==============================================================
> job_number:                 307472
> exec_file:                  job_scripts/307472
> submission_time:            Mon May 25 13:33:58 2009
> owner:                      joe
> uid:                        38
> group:                      users
> <cut one billion irrelevant lines>
> script_file:                netbatch/joe_sim001_28743/9/nb_sim_worker.sh
> context:                    JOB_NAME=foo
> usage    1:                 cpu=00:00:00, mem=0.00000 GBs, io=0.00000,
> vmem=N/A, maxvmem=N/A
> scheduling info:            queue instance "rush.q at moo165.bar.org"
> dropped because it is temporarily not available
>                            queue instance "batch.q at moo165.bar.org"
> dropped because it is temporarily not available
>                            queue instance "batch.q at moo167.bar.org"
> dropped because it is disabled
>                            queue instance
> "interactive.q at moo099.bar.org" dropped because it is disabled
>                            queue instance
> "interactive.q at moo100.bar.org" dropped because it is disabled
>                            queue instance "batch.q at moo102.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo107.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo109.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo112.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo113.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo116.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo118.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo126.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo128.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo141.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo143.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo149.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo152.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo153.bar.org"
> dropped because it is full
>                            queue instance "batch.q at moo157.bar.org"
> dropped because it is full
>                            <and it continues....>
>
> with 10000 CPUs this is a horrible interface :-(
>
> Has any of you written a script, which filters this output and give
> clear messages such as
> * The grid is fully loaded - no free CPU ressources
> * Your jobs is not running since you require license foo=1 and the
> available number is zero
> * You are not allowed to run
>
> I.e. something MUCH simpler. Rather than writing such a parser I guess
> most of you have
> been facing the same problem, i.e. it is most likely solved by some of
> you already.
> Am I right?
>
> Best
>
> -- 
> Peter Toft <pto at linuxbog.dk>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=198904
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

-- 
Peter Toft, Ph.D. [pto at linuxbog.dk] http://petertoft.dk
I blog at http://www.version2.dk/blogs/petertoft

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=199748

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list