[GE users] Clearing errors: qmod -cq / qmod -cj issues

Sebastian Stark stark at tuebingen.mpg.de
Tue Jan 24 14:04:00 GMT 2006


Am 24.01.2006 um 13:51 schrieb Reuti:

> Hi,
>
> Am 24.01.2006 um 12:39 schrieb Sebastian Stark:
>
>> Sorry, I forgot to mention that I use SGE 6.0u7 on Linux.
>>
>>
>> -Sebastian
>>
>> Am 24.01.2006 um 12:26 schrieb Sebastian Stark:
>>
>>>
>>> neckar ~ % qstat -f | awk '$6 ~ /E/ {print $1}'
>>> all.q at node120
>>> all.q at node132
>>> all.q at node135
>>> all.q at node138
>>> neckar ~ % qmod -cq all.q at node120 all.q at node132 all.q at node135  
>>> all.q at node138
>>> invalid queue "all.q at node120"
>>> invalid queue "all.q at node132"
>>> invalid queue "all.q at node135"
>>> invalid queue "all.q at node138"
>>>
>>> What am I doing wrong?
>>>
>>> The help page seems confused about this:
>>>
>>> neckar ~ % qmod -help | grep error
>>>    [-c job_wc_queue_list]  clear error state
>>>    [-cj wc_queue_list]     clear job error state
>>>    [-cq job_list]          clear queue error state
>>>
>
> yes, the entries are swapped; this is already an issue. Do you get  
> the same error for queues already without any error and specifying  
> just one queue? - Reuti

Yes, I get the same error for queue instances that are not in error  
state:

neckar ~ % qmod -cq all.q at node143
invalid queue "all.q at node143"

And now I find that if I want to check if this queue instance has an  
error it does not work at all:

neckar ~ % qstat -f -q all.q at node143
neckar ~ %

This should return the status if this queue instance, right? This  
works, however:

neckar ~ % qstat -f | grep node143
all.q at node143                  BP    2/2       14.97    lx24-amd64
parallel.q at node143             BIP   13/16     14.97    lx24-amd64

Something seriously broke in my sge installation since I upgraded to  
6.0u7, as it seems. It might also be a problem that I switched from / 
etc/hosts to DNS based hostname resolution. Could it be that SGE is  
confused about the fact that IP lookups return the fqdn of the hosts  
now instead of just the host part?

Has anybody successfully switched from /etc/hosts based to DNS based  
name resolution with an existing SGE installation?


-Sebastian

>>> Looks as if those two options cj and cq are swapped. But it still  
>>> does not work if I swap them again:
>>>
>>> neckar ~ % qmod -cj all.q at node120 all.q at node132 all.q at node135  
>>> all.q at node138
>>> invalid queue or job "all.q at node120"
>>> invalid queue or job "all.q at node132"
>>> invalid queue or job "all.q at node135"
>>> invalid queue or job "all.q at node138"
>>>
>>> Legacy mode does not work as well:
>>>
>>> neckar ~ % qmod -c all.q at node120 all.q at node132 all.q at node135  
>>> all.q at node138
>>> invalid queue "all.q at node120"
>>> invalid queue "all.q at node132"
>>> invalid queue "all.q at node135"
>>> invalid queue "all.q at node138"
>>>
>>>
>>> So, how do I clear the error state?
>>>
>>>
>>>
>>> -Sebastian
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list