[GE users] SGE 6.2: jobs queued indefinitely
Lubomir.Petrik at Sun.COM
Tue Sep 23 17:32:42 BST 2008
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
see my linine comments.
On 09/23/08 18:15, Bart Willems wrote:
> Hi Lubos,
>> qstat -j 46003
>> error: can't unpack gdi request
>> error: error unpacking gdi request: bad argument
>> failed receiving gdi request
> Yes, I can reproduce this error every time.
>> it suggests that you might be using incompatible versions (client,
>> qmaster). Maybe a mix of 6.1u4 and 6.2 binaries?
> It seems like it: see below.
>> Also you may try to restart the qmaster or just the scheduler thread via
>> qconf -kt scheduler ; qconf -at scheduler.
> I get an error message when I try this:
> #qconf -kt scheduler
> error: "-kt" is not a valid option 2
> GE 6.1u4
> So this seems to refer to 6.1u4. If I do
> # ls -l /opt/gridengine/bin/
> total 12
> drwxr-xr-x 2 root root 4096 Jul 23 04:44 lx24-amd64
> drwxr-xr-x 2 root root 4096 May 30 16:09 lx26-amd64
> -rwxr-xr-x 1 root root 54 Apr 28 21:43 rocks-qlogin.sh
> the lx24-amd64 directory was not there before the upgrade. Is this ok?
Yes. We only have lx24-* binaries that work on lx26 as well! So it seems
that you yourself added the lx26-amd64 for 6.1u4.
Simple test is to do:
/opt/gridengine/bin/lx24-amd64/qconf -help | head -1
/opt/gridengine/bin/lx26-amd64/qconf -help | head -1
I'm sure that the first one will report 6.2, while the second one 6.1u4.
If that's the case, I'd suggest to shutdown the whole cluster and remove
(or rename) all lx26-amd64 directories (bin utilbin) and start the
cluster again. You don't need to restart it if you can verify that
qmaster and all execd processes that are now running are 6.2
Once the cluster is started only the 6.2 binaries should be used and the
cluster should work. That is provided that the upgrade was done
correctly. There is a slight possibility that the 6.1u4 is still running
or was used to import the saved configuration, in that case you would
need to call the import command (load_sge_config.sh) with appropriate
arguments (see log from the first upgrade) again.
You can't mix binaries from different versions. Also we don't supply
lx26-* architecture. Can you tell me how did you get it? Did you compile
it on your own?
More information about the gridengine-users