[GE users] qsub acting up and using 99% CPU. (SOLVED!)

Stefan.O.Nordlander at astrazeneca.com Stefan.O.Nordlander at astrazeneca.com
Wed May 4 10:59:02 BST 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Just thought I'd mention that we worked around this problem and are now
running Schrödinger para_glide 3.5 fine with SGE 6.0.x. The fix was to not
run para_glide from the sge qmaster. Don't ask me why but having this
configuration works fine now:

# cat /opt/Schrodinger/schrodinger.hosts

name:        localhost
schrodinger: /software/Schrodinger/
tmpdir:      /tmp

name: test28
tmpdir: /usr/tmp
host: node1          # <----- This works. "master" does not. See problem
below.
queue: SGE
processors: 28

Also Schrödinger recommended that we'd run glide with host:no_nodes like
this:
# $SCHRODINGER/para_glide -i molsdock.inp -n 28 -HOST test28:28


Regards,
/Stefan



> -----Original Message-----
> From: Nordlander, Stefan O 
> Sent: den 11 april 2005 13:28
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] qsub acting up and using 99% CPU.
> 
> 
> Ok, so it wasn't a standard issue then. dang..
> 
> 
> This is what I've done:
> 
> 1. Added an entry for my cluster in 
> /opt/Schrodinger/schrodinger.hosts:
> name: test28
> tmpdir: /usr/tmp
> host: master
> queue: SGE
> processors: 28
> 
> 2. Configured the files in /opt/Schrodinger/queues/SGE/:
> QPATH=/opt/az/hpc/SunONEGridEngine/current/bin/lx24-x86/
> QSUB=qsub
> QDEL=qdel
> QSTAT=qstat
> 
> etc..
> 
> 3. Started the job with:
> # $SCHRODINGER/para_glide -i molsdock.inp -n 2 -HOST test28
> Which in turn calls a wrapper script called submit which ...:
> 
> # ...
> # Submit job
> $QSUB -S /bin/sh $* $script > $qsubout
> # ...
> 
> And this is the result:
> 
> # top (on on the master):
> PID   USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
> 31590 me   25   0  1496 1496  1264  R    96.9  0.0   0:29   0 qsub
> 
> # ps -ef | grep qsub
> me    2552  2544  0 12:47 pts/7    00:00:00 qsub -S /bin/sh
> /home/me/.schrodinger/.jobdb/master-0-425a55d7.batch
> 
> (With an strace as described below.)
> 
> If I kill off this job and resubmit it manually with:
> 
> # qsub -S /bin/sh /home/me/.schrodinger/.jobdb/master-0-425a55d7.batch
> It starts ok on a node!
> 
> I guess this is a problem with the inner workings of this 
> application and
> not a SGE issue.
> 
> 
> /Stefan
> 
> > -----Original Message-----
> > From: Stephan Grell - Sun Germany - SSG - Software Engineer
> > [mailto:stephan.grell at sun.com]
> > Sent: den 11 april 2005 11:06
> > To: users at gridengine.sunsource.net
> > Subject: Re: [GE users] qsub acting up and using 99% CPU.
> > 
> > 
> > Hi,
> > 
> > more information would be nice. Which version of SGE? Which 
> > OS? What was
> > the qsub command?
> > 
> > Stephan
> > 
> > Stefan.O.Nordlander at astrazeneca.com wrote:
> > 
> > >Hi,
> > >
> > >I'm hoping this is a no brainer like "erum, well you forgot 
> > to turn on the
> > >main auxiliary power relay.."
> > >
> > >I'm trying to run para_glide through SGE and I think I have 
> > everything set
> > >up ok. But when I submit a job qsub starts and uses 99.9% 
> > cpu. An strace
> > >shows me this:
> > >
> > >select(1, [0], [0], NULL, {1, 0})       = 1 (out [0], left {1, 0})
> > >gettimeofday({1113205289, 87082}, NULL) = 0
> > >gettimeofday({1113205289, 87121}, NULL) = 0
> > >gettimeofday({1113205289, 87159}, NULL) = 0
> > >gettimeofday({1113205289, 87197}, NULL) = 0
> > >select(1, [0], [0], NULL, {1, 0})       = 1 (out [0], left {1, 0})
> > >gettimeofday({1113205289, 87547}, NULL) = 0
> > >gettimeofday({1113205289, 87585}, NULL) = 0
> > >gettimeofday({1113205289, 87624}, NULL) = 0
> > >gettimeofday({1113205289, 87662}, NULL) = 0
> > >select(1, [0], [0], NULL, {1, 0})       = 1 (out [0], left {1, 0})
> > >gettimeofday({1113205289, 88012}, NULL) = 0
> > >gettimeofday({1113205289, 88051}, NULL) = 0
> > >gettimeofday({1113205289, 88089}, NULL) = 0
> > >gettimeofday({1113205289, 88127}, NULL) = 0
> > >...
> > >...
> > >...
> > >
> > >What's going on?
> > >(More details about the job is avalible if necessary.)
> > >
> > >
> > >Stefan Nordlander - Linux System Manager
> > >________________________________________________
> > >AstraZeneca R&D Mölndal
> > >Pepparedsleden 1
> > >S-431 83 Mölndal, Sweden
> > >Phone:    +46 (0)31 706 49 14
> > >Email:    Stefan.O.Nordlander at astrazeneca.com
> > >________________________________________________
> > >Unix _is_ user friendly, its just picky about who its 
> friends are...
> > >
> > >  
> > >
> > >-------------------------------------------------------------
> > -----------
> > >
> > 
> >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > >For additional commands, e-mail: 
> users-help at gridengine.sunsource.net
> > >  
> > >
> > 
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > 
> 
> 



    [ Part 2: "Attached Text" ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list