[GE users] SGE6 does not backfill

Juha Jäykkä juhaj at iki.fi
Mon Apr 11 11:37:53 BST 2005


    [ The following text is in the "ISO-8859-15" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Of course, our SGE guy is on vacation... just after we upgraded!

A compiled tarball would be nice, although I already compiled it myself.
:) Just one note: how do I know if I have flat or berkeley spooling at the
moment? I probably would not like to change that.

> Question: in Rocks the $SGE_ROOT isn't shared? Otherwise you could just
> change  some nodes to mount the new/alternative $SGE_ROOT. Why is Rocks
> reinstalling  the nodes - just when it's in the mood to do so?!? - Reuti

Well, I'm not sure about Rocks' rationale, but I assume $SGE_ROOT is not
shared since Rocks supports clusters with heterogenous architechtures. It
takes less than 15 minutes to shut down SGE, install new version,
propagate the new version to the nodes and restart SGE, so that part is
not the problem. Draining the queues is... (Rocks provides a tool called
"cluster-fork" - it uses ssh to start its parameters as processes on all
nodes. Easy to cluster-wide stuff with it. Unfortunately it does not
parallelize...)

Rocks' philosophy regarding nodes is that they are expendable. Everything
that exists on the nodes (except jobs' runtime files) exists also on a
single directory tree on "front end" (i.e. submit host in SGE language).
If ever a node is shut down, it gets automatically reinstalled. This is
quite nice, actually, since it takes less than 5 minutes to install a
node. This means that any non-hardware problem you have with a node, you
just execute "shoot-node <node-name>" and in 5 minutes, you have a working
node. There are not many problems you can solve in 5 minutes... Of course,
this does not happen often: of our 12 nodes just 2 have ever been
reinstalled. The other for testing and the other suffered a broken disc.

-- 
                 -----------------------------------------------
                | Juha Jäykkä, juolja at utu.fi			|
		| Laboratory of Theoretical Physics		|
		| Department of Physics, University of Turku	|
                | home: http://www.utu.fi/~juolja/              |
                 -----------------------------------------------


    [ Part 2, Application/PGP-SIGNATURE 196 bytes. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list