[GE users] SGE 6.2: jobs queued indefinitely

Bart Willems b-willems at northwestern.edu
Tue Sep 23 18:45:26 BST 2008


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Lubos,

one more question. Our compute nodes still only have the lx26-amd64
directory, not lx24-amd64. Does this mean I need to install SGE 6.2 on all
nodes separately?

Thanks,
Bart

> Hi Bart,
> see my linine comments.
>
> On 09/23/08 18:15, Bart Willems wrote:
>> Hi Lubos,
>>
>>
>>> qstat -j 46003
>>> error: can't unpack gdi request
>>> error: error unpacking gdi request: bad argument
>>> failed receiving gdi request
>>>
>>
>> Yes, I can reproduce this error every time.
>>
>>
>>> it suggests that you might be using incompatible versions (client,
>>> qmaster). Maybe a mix of 6.1u4 and 6.2 binaries?
>>>
>>
>> It seems like it: see below.
>>
>>
>>> Also you may try to restart the qmaster or just the scheduler thread
>>> via
>>> qconf -kt scheduler ; qconf -at scheduler.
>>>
>>
>> I get an error message when I try this:
>>
>> #qconf -kt scheduler
>> error: "-kt" is not a valid option 2
>> GE 6.1u4
>>
>> So this seems to refer to 6.1u4. If I do
>>
>> # ls -l /opt/gridengine/bin/
>> total 12
>> drwxr-xr-x 2 root root 4096 Jul 23 04:44 lx24-amd64
>> drwxr-xr-x 2 root root 4096 May 30 16:09 lx26-amd64
>> -rwxr-xr-x 1 root root   54 Apr 28 21:43 rocks-qlogin.sh
>>
>> the lx24-amd64 directory was not there before the upgrade. Is this ok?
>>
> Yes. We only have lx24-* binaries that work on lx26 as well! So it seems
> that you yourself added the lx26-amd64 for 6.1u4.
>
> Simple test is to do:
>
> /opt/gridengine/bin/lx24-amd64/qconf -help | head -1
> and
> /opt/gridengine/bin/lx26-amd64/qconf -help | head -1
>
>
> I'm sure that the first one will report 6.2, while the second one 6.1u4.
> If that's the case, I'd suggest to shutdown the whole cluster and remove
> (or rename) all lx26-amd64 directories (bin utilbin) and start the
> cluster again. You don't need to restart it if you can verify that
> qmaster and all execd processes that are now running are 6.2
>
> Once the cluster is started only the 6.2 binaries should be used and the
> cluster should work. That is provided that the upgrade was done
> correctly. There is a slight possibility that the 6.1u4 is still running
> or was used to import the saved configuration, in that case you would
> need to call the import command (load_sge_config.sh) with appropriate
> arguments (see log from the first upgrade) again.
>
> You can't mix binaries from different versions. Also we don't supply
> lx26-* architecture. Can you tell me how did you get it? Did you compile
> it on your own?
>
> Thanks,
>   Lubos.
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list