[GE users] SGE 6.2: jobs queued indefinitely

Lubomir Petrik Lubomir.Petrik at Sun.COM
Tue Sep 23 17:32:42 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Bart,
see my linine comments.

On 09/23/08 18:15, Bart Willems wrote:
> Hi Lubos,
>
>   
>> qstat -j 46003
>> error: can't unpack gdi request
>> error: error unpacking gdi request: bad argument
>> failed receiving gdi request
>>     
>
> Yes, I can reproduce this error every time.
>
>   
>> it suggests that you might be using incompatible versions (client,
>> qmaster). Maybe a mix of 6.1u4 and 6.2 binaries?
>>     
>
> It seems like it: see below.
>
>   
>> Also you may try to restart the qmaster or just the scheduler thread via
>> qconf -kt scheduler ; qconf -at scheduler.
>>     
>
> I get an error message when I try this:
>
> #qconf -kt scheduler
> error: "-kt" is not a valid option 2
> GE 6.1u4
>
> So this seems to refer to 6.1u4. If I do
>
> # ls -l /opt/gridengine/bin/
> total 12
> drwxr-xr-x 2 root root 4096 Jul 23 04:44 lx24-amd64
> drwxr-xr-x 2 root root 4096 May 30 16:09 lx26-amd64
> -rwxr-xr-x 1 root root   54 Apr 28 21:43 rocks-qlogin.sh
>
> the lx24-amd64 directory was not there before the upgrade. Is this ok?
>   
Yes. We only have lx24-* binaries that work on lx26 as well! So it seems 
that you yourself added the lx26-amd64 for 6.1u4.

Simple test is to do:

/opt/gridengine/bin/lx24-amd64/qconf -help | head -1
and
/opt/gridengine/bin/lx26-amd64/qconf -help | head -1


I'm sure that the first one will report 6.2, while the second one 6.1u4. 
If that's the case, I'd suggest to shutdown the whole cluster and remove 
(or rename) all lx26-amd64 directories (bin utilbin) and start the 
cluster again. You don't need to restart it if you can verify that 
qmaster and all execd processes that are now running are 6.2

Once the cluster is started only the 6.2 binaries should be used and the 
cluster should work. That is provided that the upgrade was done 
correctly. There is a slight possibility that the 6.1u4 is still running 
or was used to import the saved configuration, in that case you would 
need to call the import command (load_sge_config.sh) with appropriate 
arguments (see log from the first upgrade) again.

You can't mix binaries from different versions. Also we don't supply 
lx26-* architecture. Can you tell me how did you get it? Did you compile 
it on your own?

Thanks,
  Lubos.




More information about the gridengine-users mailing list