[GE users] Script to show pending reasons in SGE

pto pto at linuxbog.dk
Tue May 26 07:26:26 BST 2009


Dear all

One of the annoying things I see with SGE 6.x is the reasons I can see
for pending jobs in an SGE queue.
Assume that Job 307472 is not running, and I want to know why. Then I use qstat

$ qstat -j 307472
==============================================================
job_number:                 307472
exec_file:                  job_scripts/307472
submission_time:            Mon May 25 13:33:58 2009
owner:                      joe
uid:                        38
group:                      users
<cut one billion irrelevant lines>
script_file:                netbatch/joe_sim001_28743/9/nb_sim_worker.sh
context:                    JOB_NAME=foo
usage    1:                 cpu=00:00:00, mem=0.00000 GBs, io=0.00000,
vmem=N/A, maxvmem=N/A
scheduling info:            queue instance "rush.q at moo165.bar.org"
dropped because it is temporarily not available
                            queue instance "batch.q at moo165.bar.org"
dropped because it is temporarily not available
                            queue instance "batch.q at moo167.bar.org"
dropped because it is disabled
                            queue instance
"interactive.q at moo099.bar.org" dropped because it is disabled
                            queue instance
"interactive.q at moo100.bar.org" dropped because it is disabled
                            queue instance "batch.q at moo102.bar.org"
dropped because it is full
                            queue instance "batch.q at moo107.bar.org"
dropped because it is full
                            queue instance "batch.q at moo109.bar.org"
dropped because it is full
                            queue instance "batch.q at moo112.bar.org"
dropped because it is full
                            queue instance "batch.q at moo113.bar.org"
dropped because it is full
                            queue instance "batch.q at moo116.bar.org"
dropped because it is full
                            queue instance "batch.q at moo118.bar.org"
dropped because it is full
                            queue instance "batch.q at moo126.bar.org"
dropped because it is full
                            queue instance "batch.q at moo128.bar.org"
dropped because it is full
                            queue instance "batch.q at moo141.bar.org"
dropped because it is full
                            queue instance "batch.q at moo143.bar.org"
dropped because it is full
                            queue instance "batch.q at moo149.bar.org"
dropped because it is full
                            queue instance "batch.q at moo152.bar.org"
dropped because it is full
                            queue instance "batch.q at moo153.bar.org"
dropped because it is full
                            queue instance "batch.q at moo157.bar.org"
dropped because it is full
                            <and it continues....>

with 10000 CPUs this is a horrible interface :-(

Has any of you written a script, which filters this output and give
clear messages such as
* The grid is fully loaded - no free CPU ressources
* Your jobs is not running since you require license foo=1 and the
available number is zero
* You are not allowed to run

I.e. something MUCH simpler. Rather than writing such a parser I guess
most of you have
been facing the same problem, i.e. it is most likely solved by some of
you already.
Am I right?

Best

-- 
Peter Toft <pto at linuxbog.dk>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=198904

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list