[GE users] Jobs status doesn't change from "qw" ?

reuti reuti at staff.uni-marburg.de
Thu Nov 5 09:43:40 GMT 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Are you doing all tests as root? This is always special, as root  
might get squashed on NFS mounts (so can't write) and it also has a  
distinct home (/root) on each node.

The error can be checked in $SGE_ROOT/default/spool/ 
biggjapan01.biggjapan.co.jp/messages I think.

Can you please remove the error and submit as a normal user in its / 
home/...?

-- Reuti

Am 05.11.2009 um 06:29 schrieb umanga:

> I tried to fix the issue by restarting execution hosts,qmaster and  
> using "qmod -c '*'" but nothing solved the problem.
> Then I reinstalled a sample cluster and still for
> #qstat -f -explain E gives:
>
> queuename                      qtype resv/used/tot. load_avg  
> arch          states
> ---------------------------------------------------------------------- 
> -----------
> all.q at biggjapan01.biggjapan.co BIP   0/0/2          0.00     lx24- 
> amd64    E
>         queue all.q marked QERROR as result of job 1's failure at  
> host biggjapan01.biggjapan.com
>
> the script which I ran was :
>
> #!/bin/bash
> date
> sleep 4
> date
> echo MAY THE FORCE BE WITH YOU >> /SGE6/out.txt
>
> Where SGE6 is a NFS folder.
>
> Any tips ?
>
> Thanks in advance
>
> sgexav wrote:
>>
>> While all your queues are in Error state, at least you now know  
>> why your job is not starting you can try qmod -c queue_name You  
>> better check man qmod before because -c has been replace in the  
>> last version with -cj?? X. umanga a écrit :
>>>
>>> hi , All queue instances show the state 'E' qstat -F shows :  
>>> umanga:~# qstat -F queuename qtype resv/used/tot. load_avg arch  
>>> states  
>>> -------------------------------------------------------------------- 
>>> ------------- all.q at biggjapan01.biggjapan.co BIP 0/0/2 0.01 lx24- 
>>> amd64 E hl:arch=lx24-amd64 hl:num_proc=2 hl:mem_total=3.874G  
>>> hl:swap_total=11.352G hl:virtual_total=15.226G  
>>> hl:load_avg=0.010000 hl:load_short=0.010000  
>>> hl:load_medium=0.010000 hl:load_long=0.000000 hl:mem_free=3.200G  
>>> hl:swap_free=11.325G hl:virtual_free=14.525G hl:mem_used=690.691M  
>>> hl:swap_used=27.461M hl:virtual_used=718.152M hl:cpu=0.400000  
>>> hl:np_load_avg=0.005000 hl:np_load_short=0.005000  
>>> hl:np_load_medium=0.005000 hl:np_load_long=0.000000  
>>> qf:qname=all.q qf:hostname=biggjapan01.biggjapan.com qc:slots=2  
>>> qf:tmpdir=/tmp qf:seq_no=0 qf:rerun=0.000000 qf:calendar=NONE  
>>> qf:s_rt=infinity qf:h_rt=infinity qf:s_cpu=infinity  
>>> qf:h_cpu=infinity qf:s_fsize=infinity qf:h_fsize=infinity  
>>> qf:s_data=infinity qf:h_data=infinity qf:s_stack=infinity  
>>> qf:h_stack=infinity qf:s_core=infinity qf:h_core=infinity  
>>> qf:s_rss=infinity qf:h_rss=infinity qf:s_vmem=infinity  
>>> qf:h_vmem=infinity qf:min_cpu_interval=00:05:00  
>>> -------------------------------------------------------------------- 
>>> ------------- all.q at umanga BIP 0/0/2 0.16 lx24-amd64 E  
>>> hl:arch=lx24-amd64 hl:num_proc=2 hl:mem_total=3.874G  
>>> hl:swap_total=11.353G hl:virtual_total=15.227G  
>>> hl:load_avg=0.160000 hl:load_short=0.090000  
>>> hl:load_medium=0.160000 hl:load_long=0.130000 hl:mem_free=2.017G  
>>> hl:swap_free=11.352G hl:virtual_free=13.369G hl:mem_used=1.857G  
>>> hl:swap_used=868.000K hl:virtual_used=1.858G hl:cpu=6.000000  
>>> hl:np_load_avg=0.080000 hl:np_load_short=0.045000  
>>> hl:np_load_medium=0.080000 hl:np_load_long=0.065000  
>>> qf:qname=all.q qf:hostname=umanga qc:slots=2 qf:tmpdir=/tmp  
>>> qf:seq_no=0 qf:rerun=0.000000 qf:calendar=NONE qf:s_rt=infinity  
>>> qf:h_rt=infinity qf:s_cpu=infinity qf:h_cpu=infinity  
>>> qf:s_fsize=infinity qf:h_fsize=infinity qf:s_data=infinity  
>>> qf:h_data=infinity qf:s_stack=infinity qf:h_stack=infinity  
>>> qf:s_core=infinity qf:h_core=infinity qf:s_rss=infinity  
>>> qf:h_rss=infinity qf:s_vmem=infinity qf:h_vmem=infinity  
>>> qf:min_cpu_interval=00:05:00  
>>> #################################################################### 
>>> ######## - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING  
>>> JOBS - PENDING JOBS  
>>> #################################################################### 
>>> ######## 10 0.55500 simple.sh root qw 11/04/2009 19:25:32 1  
>>> sgexav wrote:
>>>>
>>>> What is the state of your queue? au? if yes you have to (re) 
>>>> start sgeexecd deamond on your nodes... What do qstat -f say?  
>>>> umanga a écrit :
>>>>>
>>>>> Greetings all, When I submit jobs as "#qsub /SGE6/ 
>>>>> simple.sh" (or using Java DRMAA) , the jobs keep staying on the  
>>>>> status "qw" and does not get executed. All the submitted jobs  
>>>>> and queued in the Job Queue and never get dispatched. What  
>>>>> could be the issue ? Thanks in advance, umanga  
>>>>> ------------------------------------------------------ http:// 
>>>>> gridengine.sunsource.net/ds/viewMessage.do? 
>>>>> dsForumId=38&dsMessageId=224970 To unsubscribe from this  
>>>>> discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>> ------------------------------------------------------ http:// 
>>>> gridengine.sunsource.net/ds/viewMessage.do? 
>>>> dsForumId=38&dsMessageId=224982 To unsubscribe from this  
>>>> discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>> ------------------------------------------------------ http:// 
>> gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=224986 To unsubscribe from this  
>> discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=225156

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list