[GE users] SGE 6 - queues entering error state

Reuti reuti at staff.uni-marburg.de
Sun Aug 13 23:17:34 BST 2006


Am 12.08.2006 um 00:58 schrieb Bevan C. Bennett:

> Reuti wrote:
>> Am 11.08.2006 um 00:56 schrieb Bevan C. Bennett:
>>
>>> Reuti wrote:
>>>> Am 10.08.2006 um 22:32 schrieb Bevan C. Bennett:
>>>>
>>>>> Reuti wrote:
>>>>>> Am 10.08.2006 um 20:24 schrieb Bevan C. Bennett:
>>>>>>
>>>>>>>
>>>>>>>> can you please post your queue, sge and exechost configuration.
>>>>>>>
>>>>>>> Which parts of it?
>>>>>>
>>>>>> The first few lines form the SGE conf, where the spool  
>>>>>> directories are
>>>>>> defined, and maybe you have a local configuration for some of  
>>>>>> your
>>>>>> hosts
>>>>>> (qconf -sconfl)? And prolog/epilog in any of them?
>>>>>
>>>>> It's pretty basic...
>>>>> [bevan at alexander ~]$ qconf -sconf
>>>>> global:
>>>>> execd_spool_dir              /usr/local/grid-6.0/default/spool
>>>>
>>>> But this is different from the below mentioned:
>>>>
>>>> /mnt/local/common/grid-test/default/spool/cobalt/active_jobs/ 
>>>> 2313.1/pid
>>> Actually it's not different.
>>> /usr/local -> /mnt/local/OS
>>> /mnt/local/OS/grid-6.0 -> /mnt/local/common/grid-test
>>
>> Is this all inside one mount point, i.e. is /mnt/local/OS/grid-6.0
>> pointing to another export, or only inside itself?
>
> /mnt/local is the mount point. OS is a series of subdirectories.

Okay. As the original problem was the dying ssh daemon:

- is there any hint in the /var/log files of the machines?
- any really low memory limits set in the queue definition on some node?

-- Reuti


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list