[GE users] SGE 6.2: jobs queued indefinitely

Lubomir Petrik Lubomir.Petrik at Sun.COM
Wed Sep 24 09:30:23 BST 2008

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

On 09/23/08 19:45, Bart Willems wrote:
> Hi Lubos,
> one more question. Our compute nodes still only have the lx26-amd64
> directory, not lx24-amd64. Does this mean I need to install SGE 6.2 on all
> nodes separately?
Yes. This means that you most likely use local binaries. These were 
unfortunately not overwritten by the upgrade procedure. I'll check and 
improve documentation.

What you need to do:
Shutdown all execds. Remove this lx26-amd64 architecture and copy to 
each execd the new lx24-amd64.  This most likely applies to both bin and 
utilbin directories.
Depending on your setup (what is local and what is shared):
Copy whole SGE_ROOT, except for the SGE_CELL directory if SGE_CELL is 
If you have even local SGE_CELL, then whole SGE_ROOT is needed as the 
new bootstrap file is only in the SGE_CELL/common/bootstrap on the 
master host.

Once all hosts have lx24-amd64 and can access the new bootstrap file, 
you may start the execds and all commands will now work.

What exactly happened:
Your environment uses only shared SGE_CELL.
Upgrade started on master host. This upgraded only the master host's 
(MISSING): You should've copied the new SGE_ROOT to each execd host
Started the cluster.

Now qmaster was 6.2 but all execds and all clients on non-master host 
are 6.1u4.

As I already said, I'll improve the documentations regarding this case.

Please let me know, if your cluster finally works.


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list