[GE users] Clearing errors: qmod -cq / qmod -cj issues

Reuti reuti at staff.uni-marburg.de
Tue Jan 24 15:58:48 GMT 2006


Hi,

Am 24.01.2006 um 15:04 schrieb Sebastian Stark:

>
> Am 24.01.2006 um 13:51 schrieb Reuti:
>
>> Hi,
>>
>> Am 24.01.2006 um 12:39 schrieb Sebastian Stark:
>>
>>> Sorry, I forgot to mention that I use SGE 6.0u7 on Linux.
>>>
>>>
>>> -Sebastian
>>>
>>> Am 24.01.2006 um 12:26 schrieb Sebastian Stark:
>>>
>>>>
>>>> neckar ~ % qstat -f | awk '$6 ~ /E/ {print $1}'
>>>> all.q at node120
>>>> all.q at node132
>>>> all.q at node135
>>>> all.q at node138
>>>> neckar ~ % qmod -cq all.q at node120 all.q at node132 all.q at node135  
>>>> all.q at node138
>>>> invalid queue "all.q at node120"
>>>> invalid queue "all.q at node132"
>>>> invalid queue "all.q at node135"
>>>> invalid queue "all.q at node138"
>>>>
>>>> What am I doing wrong?
>>>>
>>>> The help page seems confused about this:
>>>>
>>>> neckar ~ % qmod -help | grep error
>>>>    [-c job_wc_queue_list]  clear error state
>>>>    [-cj wc_queue_list]     clear job error state
>>>>    [-cq job_list]          clear queue error state
>>>>
>>
>> yes, the entries are swapped; this is already an issue. Do you get  
>> the same error for queues already without any error and specifying  
>> just one queue? - Reuti
>
> Yes, I get the same error for queue instances that are not in error  
> state:
>
> neckar ~ % qmod -cq all.q at node143
> invalid queue "all.q at node143"
>
> And now I find that if I want to check if this queue instance has  
> an error it does not work at all:
>
> neckar ~ % qstat -f -q all.q at node143
> neckar ~ %
>
> This should return the status if this queue instance, right? This  
> works, however:
>
> neckar ~ % qstat -f | grep node143
> all.q at node143                  BP    2/2       14.97    lx24-amd64
> parallel.q at node143             BIP   13/16     14.97    lx24-amd64
>
> Something seriously broke in my sge installation since I upgraded  
> to 6.0u7, as it seems. It might also be a problem that I switched  
> from /etc/hosts to DNS based hostname resolution. Could it be that  
> SGE is confused about the fact that IP lookups return the fqdn of  
> the hosts now instead of just the host part?
>

yes, there might be an issue, as the internal name known to SGE is  
only the short one. What is "qhost", "qconf -sel" saying? Maybe you  
have to add the machines now again (and you will see the FQDN also in  
qstat for each queue instance).

-- Reuti

> Has anybody successfully switched from /etc/hosts based to DNS  
> based name resolution with an existing SGE installation?
>
>
> -Sebastian
>
>>>> Looks as if those two options cj and cq are swapped. But it  
>>>> still does not work if I swap them again:
>>>>
>>>> neckar ~ % qmod -cj all.q at node120 all.q at node132 all.q at node135  
>>>> all.q at node138
>>>> invalid queue or job "all.q at node120"
>>>> invalid queue or job "all.q at node132"
>>>> invalid queue or job "all.q at node135"
>>>> invalid queue or job "all.q at node138"
>>>>
>>>> Legacy mode does not work as well:
>>>>
>>>> neckar ~ % qmod -c all.q at node120 all.q at node132 all.q at node135  
>>>> all.q at node138
>>>> invalid queue "all.q at node120"
>>>> invalid queue "all.q at node132"
>>>> invalid queue "all.q at node135"
>>>> invalid queue "all.q at node138"
>>>>
>>>>
>>>> So, how do I clear the error state?
>>>>
>>>>
>>>>
>>>> -Sebastian
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list