[GE users] qsub acting up and using 99% CPU.

Andy Schwierskott andy.schwierskott at sun.com
Mon Apr 11 16:36:38 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

I see two possible explanations for the qsub behavior. I think it could a
commlib problem which causes the high CPU usage, but this could be debugged
by running the qsub command in a SGE debug level (e.g. set the variable
"SGE_DEBUG_LEVEL" to "3 0 0 0 0 0 0 0" and/or set SGE_COMMLIB_DEBUG to "3"
or "4".

1. the problem could be related to an unusal setting of the open file
descriptors when qsub is issued.

2. There are SGE variables set/unset in the qsub environment when called
from the application which differ fro mthe command line environment. This
could be verified with some debug output in the wrapper.

Andy

> Ok, so it wasn't a standard issue then. dang..
>
>
> This is what I've done:
>
> 1. Added an entry for my cluster in /opt/Schrodinger/schrodinger.hosts:
> name: test28
> tmpdir: /usr/tmp
> host: master
> queue: SGE
> processors: 28
>
> 2. Configured the files in /opt/Schrodinger/queues/SGE/:
> QPATH=/opt/az/hpc/SunONEGridEngine/current/bin/lx24-x86/
> QSUB=qsub
> QDEL=qdel
> QSTAT=qstat
>
> etc..
>
> 3. Started the job with:
> # $SCHRODINGER/para_glide -i molsdock.inp -n 2 -HOST test28
> Which in turn calls a wrapper script called submit which ...:
>
> # ...
> # Submit job
> $QSUB -S /bin/sh $* $script > $qsubout
> # ...
>
> And this is the result:
>
> # top (on on the master):
> PID   USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
> 31590 me   25   0  1496 1496  1264  R    96.9  0.0   0:29   0 qsub
>
> # ps -ef | grep qsub
> me    2552  2544  0 12:47 pts/7    00:00:00 qsub -S /bin/sh
> /home/me/.schrodinger/.jobdb/master-0-425a55d7.batch
>
> (With an strace as described below.)
>
> If I kill off this job and resubmit it manually with:
>
> # qsub -S /bin/sh /home/me/.schrodinger/.jobdb/master-0-425a55d7.batch
> It starts ok on a node!
>
> I guess this is a problem with the inner workings of this application and
> not a SGE issue.
>
>
> /Stefan
>
>> -----Original Message-----
>> From: Stephan Grell - Sun Germany - SSG - Software Engineer
>> [mailto:stephan.grell at sun.com]
>> Sent: den 11 april 2005 11:06
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] qsub acting up and using 99% CPU.
>>
>>
>> Hi,
>>
>> more information would be nice. Which version of SGE? Which
>> OS? What was
>> the qsub command?
>>
>> Stephan
>>
>> Stefan.O.Nordlander at astrazeneca.com wrote:
>>
>>> Hi,
>>>
>>> I'm hoping this is a no brainer like "erum, well you forgot
>> to turn on the
>>> main auxiliary power relay.."
>>>
>>> I'm trying to run para_glide through SGE and I think I have
>> everything set
>>> up ok. But when I submit a job qsub starts and uses 99.9%
>> cpu. An strace
>>> shows me this:
>>>
>>> select(1, [0], [0], NULL, {1, 0})       = 1 (out [0], left {1, 0})
>>> gettimeofday({1113205289, 87082}, NULL) = 0
>>> gettimeofday({1113205289, 87121}, NULL) = 0
>>> gettimeofday({1113205289, 87159}, NULL) = 0
>>> gettimeofday({1113205289, 87197}, NULL) = 0
>>> select(1, [0], [0], NULL, {1, 0})       = 1 (out [0], left {1, 0})
>>> gettimeofday({1113205289, 87547}, NULL) = 0
>>> gettimeofday({1113205289, 87585}, NULL) = 0
>>> gettimeofday({1113205289, 87624}, NULL) = 0
>>> gettimeofday({1113205289, 87662}, NULL) = 0
>>> select(1, [0], [0], NULL, {1, 0})       = 1 (out [0], left {1, 0})
>>> gettimeofday({1113205289, 88012}, NULL) = 0
>>> gettimeofday({1113205289, 88051}, NULL) = 0
>>> gettimeofday({1113205289, 88089}, NULL) = 0
>>> gettimeofday({1113205289, 88127}, NULL) = 0
>>> ...
>>> ...
>>> ...
>>>
>>> What's going on?
>>> (More details about the job is avalible if necessary.)
>>>
>>>
>>> Stefan Nordlander - Linux System Manager
>>> ________________________________________________
>>> AstraZeneca R&D Mölndal
>>> Pepparedsleden 1
>>> S-431 83 Mölndal, Sweden
>>> Phone:    +46 (0)31 706 49 14
>>> Email:    Stefan.O.Nordlander at astrazeneca.com
>>> ________________________________________________
>>> Unix _is_ user friendly, its just picky about who its friends are...


    [ Part 2: "Attached Text" ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list