[GE users] SGE 6.2: jobs queued indefinitely

Lubomir Petrik Lubomir.Petrik at Sun.COM
Wed Sep 24 09:30:23 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

On 09/23/08 19:45, Bart Willems wrote:
> Hi Lubos,
>
> one more question. Our compute nodes still only have the lx26-amd64
> directory, not lx24-amd64. Does this mean I need to install SGE 6.2 on all
> nodes separately?
>   
Yes. This means that you most likely use local binaries. These were 
unfortunately not overwritten by the upgrade procedure. I'll check and 
improve documentation.

What you need to do:
Shutdown all execds. Remove this lx26-amd64 architecture and copy to 
each execd the new lx24-amd64.  This most likely applies to both bin and 
utilbin directories.
Depending on your setup (what is local and what is shared):
Copy whole SGE_ROOT, except for the SGE_CELL directory if SGE_CELL is 
shared.
If you have even local SGE_CELL, then whole SGE_ROOT is needed as the 
new bootstrap file is only in the SGE_CELL/common/bootstrap on the 
master host.

Once all hosts have lx24-amd64 and can access the new bootstrap file, 
you may start the execds and all commands will now work.

What exactly happened:
Your environment uses only shared SGE_CELL.
Upgrade started on master host. This upgraded only the master host's 
SGE_ROOT.
(MISSING): You should've copied the new SGE_ROOT to each execd host
Started the cluster.

Now qmaster was 6.2 but all execds and all clients on non-master host 
are 6.1u4.

As I already said, I'll improve the documentations regarding this case.

Please let me know, if your cluster finally works.

Lubos.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list