[GE users] qsh

pvdmeer pvdmeer at gmail.com
Fri Jun 26 08:52:54 BST 2009


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Dear SGE users,

Where yesterday things seemed to be fine, SGE now fails to schedule qsh after a few attempts. The first times it works, then it fails, and doesn't seem to recover. The logs show nothing this time. And no error states in qstat -f.

Qacct came up with this:

celaeno:~# qacct -j 2730
==============================================================
qname        all_8max.q
hostname     celaeno.sron.nl<http://celaeno.sron.nl>
group        eos
owner        pieterm
project      NONE
department   defaultdepartment
jobname      INTERACTIVE
jobnumber    2730
taskid       undefined
account      sge
priority     0
qsub_time    Fri Jun 26 09:38:46 2009
start_time   Fri Jun 26 09:38:46 2009
end_time     Fri Jun 26 09:38:46 2009
granted_pe   NONE
slots        1
failed       0
exit_status  1
ru_wallclock 0
ru_utime     0.000
ru_stime     0.004
ru_maxrss    0
ru_ixrss     0
ru_ismrss    0
ru_idrss     0
ru_isrss     0
ru_minflt    810
ru_majflt    0
ru_nswap     0
ru_inblock   0
ru_oublock   8
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     24
ru_nivcsw    0
cpu          0.004
mem          0.000
io           0.000
iow          0.000
maxvmem      0.000
arid         undefined

Qstat output:

pleione [~]% qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all_8max.q at celaeno.sron.nl<mailto:all_8max.q at celaeno.sron.nl>     BIP   0/0/8          0.00     lx26-amd64
---------------------------------------------------------------------------------
all_8max.q at merope.sron.nl<mailto:all_8max.q at merope.sron.nl>      BIP   0/3/8          3.01     lx26-amd64
---------------------------------------------------------------------------------
all_8max.q at pleione.sron.nl<mailto:all_8max.q at pleione.sron.nl>     BIP   0/0/8          4.00     lx26-amd64
---------------------------------------------------------------------------------
all_8max.q at taygeta.sron.nl<mailto:all_8max.q at taygeta.sron.nl>     BIP   0/1/8          1.94     lx26-amd64

Seems there are plenty of slots left on the machines..
What could be the cause of qsh failing after a couple of attempts? Any help would be appreciated.

With kind regards,

Pieter van der Meer




On Thu, Jun 25, 2009 at 6:59 PM, Pieter van der Meer <pvdmeer at gmail.com<mailto:pvdmeer at gmail.com>> wrote:
Hey,

I seem to be progressing. I just installed xterm on all machines, which was required by "qsh". Checking the logs helped.
I can run qsh now, it shows an xterm window running csh, so everything's fine.

With kind regards,

Pieter





More information about the gridengine-users mailing list