[GE users] We get "GE maintrunk" when we start the daemons

Andy Schwierskott andy.schwierskott at sun.com
Fri Nov 30 14:25:16 GMT 2007


Hi Esteban,

first congratulation to your successful compile! See my answers below.

> We have installed SGE 6.1u2 compiled from de source on a virtual machine
> with SL4(Red Hat enterprise 4), and we get some estrange things, as for
> example:
>
> - If  we execute the command 'qstat -s z' as user root, we don't see the 
> finished jobs, however, as user who send the job, a qstat -s z show the 
> finished jobs for this user.

This is the new 6.1 behavior. By default a user only sees his own jobs.
Add a '-u "*"' to the command line or add

    -u *

to the system wide or local "sge_request/.sge_request" file to se all jobs
by default.

> - When we execute  qconf -help, we don't see the installed version, and we 
> obtain "GE maintruk", we also obtain "GE maintruk" when we start the daemons.
> -------------------------------------------------------------------------------------------------------------------------------------------------
> [root at sa3-ce ~]# /etc/init.d/sgemaster start
>  starting sge_qmaster
>  starting sge_schedd
> starting up GE maintrunk (lx26-x86)
> -------------------------------------------------------------------------------------------------------------------------------------------------

Did you use the source tar.gz ball from the Document & files download page

    http://gridengine.sunsource.net/servlets/ProjectDocumentList

or did you check the code yourself:

    cvs co -r V61u2_TAG
    cvs co -r V61u3_TAG   (this is out since 2 days)

This is not SGE6.1u2 obviously, but a code from the maintrunk (or there's a
small bug in the code which gives the wrong version number).

What content do your

    CVS/Root

file have? Or does it not exist at all?

> [root at sa3-ce ~]# qconf -help | head -n 4
> GE maintrunk
> usage: qconf [options]
>  [-aattr obj_nm attr_nm val obj_id_lst]   add to a list attribute of an 
> object
>  [-Aattr obj_nm fname obj_id_lst]         add to a list attribute of an 
> object
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
> - On the master node and on the wn have a lot of same SGE daemons running, 
> when we start of daemons:
> -------------------------------------------------------------------------------------------------------------------------------------------------
> [root at sa3-ce etc]# ps -ef | grep sge
> root     26512     1  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_qmaster
> root     26518 26512  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_qmaster
> root     26520 26518  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_qmaster
> root     26523 26518  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_qmaster
> root     26524 26518  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_qmaster
> root     26525 26518  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_qmaster
> root     26526 26518  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_qmaster
> root     26527 26518  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_qmaster
> root     26528 26518  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_qmaster
> root     26529 26518  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_qmaster
> root     26530 26518  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_qmaster
> root     26533     1  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_schedd
> root     26534 26533  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_schedd
> root     26535 26534  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_schedd
> root     26536 26534  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_schedd
> root     26537 26534  0 15:59 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_schedd
>
> -------------------------------------------------------------------------------------------------------------------------------------------------
> [root at sa3-wn001 ~]# ps -ef | grep sge
> root     28757     1  0 13:55 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_execd
> root     28758 28757  0 13:55 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_execd
> root     28759 28758  0 13:55 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_execd
> root     28760 28758  0 13:55 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_execd
> root     28761 28758  0 13:55 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_execd
> root     28762 28758  0 13:55 ?        00:00:00 
> /usr/local/sge/pro/bin/lx26-x86/sge_execd
> -------------------------------------------------------------------------------------------------------------------------------------------------

On older Linux systems without the NTPL threading library (and kernel) all
threads appear as an individual process wiht "ps". I'm somewhat surprised
that RH4 is still using the old threading kernel/library. But it's not a
bug.

Andy

> On the other hand, the jobs sent with "qsub" are executed correctly.  Does 
> anybody know what can be happening?
>
>
> Thank you very much,
> Esteban

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list