[GE users] Jobs status doesn't change from "qw" ?

umanga aumanga at biggjapan.com
Thu Nov 5 09:46:11 GMT 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi reuti,

I figured out the problem, it was my script , I forgot the line :
#$ -S /bin/sh

after I adding that , everything worked fine.

Thanks for your help.
Regards,
umanga

reuti wrote:

Are you doing all tests as root? This is always special, as root
might get squashed on NFS mounts (so can't write) and it also has a
distinct home (/root) on each node.

The error can be checked in $SGE_ROOT/default/spool/
biggjapan01.biggjapan.co.jp/messages I think.

Can you please remove the error and submit as a normal user in its /
home/...?

-- Reuti

Am 05.11.2009 um 06:29 schrieb umanga:



I tried to fix the issue by restarting execution hosts,qmaster and
using "qmod -c '*'" but nothing solved the problem.
Then I reinstalled a sample cluster and still for
#qstat -f -explain E gives:

queuename                      qtype resv/used/tot. load_avg
arch          states
----------------------------------------------------------------------
-----------
all.q at biggjapan01.biggjapan.co<mailto:all.q at biggjapan01.biggjapan.co> BIP   0/0/2          0.00     lx24-
amd64    E
        queue all.q marked QERROR as result of job 1's failure at
host biggjapan01.biggjapan.com

the script which I ran was :

#!/bin/bash
date
sleep 4
date
echo MAY THE FORCE BE WITH YOU >> /SGE6/out.txt

Where SGE6 is a NFS folder.

Any tips ?

Thanks in advance

sgexav wrote:


While all your queues are in Error state, at least you now know
why your job is not starting you can try qmod -c queue_name You
better check man qmod before because -c has been replace in the
last version with -cj?? X. umanga a écrit :


hi , All queue instances show the state 'E' qstat -F shows :
umanga:~# qstat -F queuename qtype resv/used/tot. load_avg arch
states
--------------------------------------------------------------------
------------- all.q at biggjapan01.biggjapan.co<mailto:all.q at biggjapan01.biggjapan.co> BIP 0/0/2 0.01 lx24-
amd64 E hl:arch=lx24-amd64 hl:num_proc=2 hl:mem_total=3.874G
hl:swap_total=11.352G hl:virtual_total=15.226G
hl:load_avg=0.010000 hl:load_short=0.010000
hl:load_medium=0.010000 hl:load_long=0.000000 hl:mem_free=3.200G
hl:swap_free=11.325G hl:virtual_free=14.525G hl:mem_used=690.691M
hl:swap_used=27.461M hl:virtual_used=718.152M hl:cpu=0.400000
hl:np_load_avg=0.005000 hl:np_load_short=0.005000
hl:np_load_medium=0.005000 hl:np_load_long=0.000000
qf:qname=all.q qf:hostname=biggjapan01.biggjapan.com qc:slots=2
qf:tmpdir=/tmp qf:seq_no=0 qf:rerun=0.000000 qf:calendar=NONE
qf:s_rt=infinity qf:h_rt=infinity qf:s_cpu=infinity
qf:h_cpu=infinity qf:s_fsize=infinity qf:h_fsize=infinity
qf:s_data=infinity qf:h_data=infinity qf:s_stack=infinity
qf:h_stack=infinity qf:s_core=infinity qf:h_core=infinity
qf:s_rss=infinity qf:h_rss=infinity qf:s_vmem=infinity
qf:h_vmem=infinity qf:min_cpu_interval=00:05:00
--------------------------------------------------------------------
------------- all.q at umanga BIP 0/0/2 0.16 lx24-amd64 E
hl:arch=lx24-amd64 hl:num_proc=2 hl:mem_total=3.874G
hl:swap_total=11.353G hl:virtual_total=15.227G
hl:load_avg=0.160000 hl:load_short=0.090000
hl:load_medium=0.160000 hl:load_long=0.130000 hl:mem_free=2.017G
hl:swap_free=11.352G hl:virtual_free=13.369G hl:mem_used=1.857G
hl:swap_used=868.000K hl:virtual_used=1.858G hl:cpu=6.000000
hl:np_load_avg=0.080000 hl:np_load_short=0.045000
hl:np_load_medium=0.080000 hl:np_load_long=0.065000
qf:qname=all.q qf:hostname=umanga qc:slots=2 qf:tmpdir=/tmp
qf:seq_no=0 qf:rerun=0.000000 qf:calendar=NONE qf:s_rt=infinity
qf:h_rt=infinity qf:s_cpu=infinity qf:h_cpu=infinity
qf:s_fsize=infinity qf:h_fsize=infinity qf:s_data=infinity
qf:h_data=infinity qf:s_stack=infinity qf:h_stack=infinity
qf:s_core=infinity qf:h_core=infinity qf:s_rss=infinity
qf:h_rss=infinity qf:s_vmem=infinity qf:h_vmem=infinity
qf:min_cpu_interval=00:05:00
####################################################################
######## - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING
JOBS - PENDING JOBS
####################################################################
######## 10 0.55500 simple.sh root qw 11/04/2009 19:25:32 1
sgexav wrote:


What is the state of your queue? au? if yes you have to (re)
start sgeexecd deamond on your nodes... What do qstat -f say?
umanga a écrit :


Greetings all, When I submit jobs as "#qsub /SGE6/
simple.sh" (or using Java DRMAA) , the jobs keep staying on the
status "qw" and does not get executed. All the submitted jobs
and queued in the Job Queue and never get dispatched. What
could be the issue ? Thanks in advance, umanga
------------------------------------------------------ http://
gridengine.sunsource.net/ds/viewMessage.do?
dsForumId=38&dsMessageId=224970 To unsubscribe from this
discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].


------------------------------------------------------ http://
gridengine.sunsource.net/ds/viewMessage.do?
dsForumId=38&dsMessageId=224982 To unsubscribe from this
discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].


------------------------------------------------------ http://
gridengine.sunsource.net/ds/viewMessage.do?
dsForumId=38&dsMessageId=224986 To unsubscribe from this
discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].



------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=225156

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].





More information about the gridengine-users mailing list