[GE users] Fwd: subnode with empty slots but jobs in queue

jlforrest jlforrest at berkeley.edu
Mon Dec 6 18:14:38 GMT 2010


On 12/6/2010 10:04 AM, reuti wrote:

>> Right now compute-0-8 is down, although qstat still shows
>> some jobs for it. (Why would this happen?)
>
> SGE assumes some network problems. You will have to use `qdel -f ...` to get rid of these jobs.

I've now done that.

>> The qstat output for compute-0-7 shows
>>
>> all.q at compute-0-7.local        BIP   0/48/48        29.05    lx26-amd64
>
> So, all 48 out of 48 seem to be used up.
>
>> and then it shows 48 serial jobs underneath! Yet, ssh-ing to
>> compute-0-7 and running ps clearly only shows 29 jobs running
>
> What is `qstat -g t -l h=compute-0-7.local -s r` showing?

It shows nothing. But, it also shows nothing for the
nodes that are working correctly, e.g. consider compute-0-0
whose status is shown as

compute-0-0    lx26-amd64    4  4.97    7.8G  831.5M   11.7G   75.7M

Running 'qstat -g t -l h=compute-0-0 -s' results in
no output. Is this correct?

Cordially,

-- 
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlforrest at berkeley.edu

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302522

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list