[GE users] qstat don't list all users
dominik.kuehne at tu-berlin.de
Tue Mar 23 16:29:06 GMT 2010
On 23.03.2010, at 16:36, reuti wrote:
> Am 23.03.2010 um 15:41 schrieb dominik:
>> On 23.03.2010, at 15:31, reuti wrote:
>>> Am 23.03.2010 um 15:22 schrieb dominik:
>>>> On 23.03.2010, at 12:25, reuti wrote:
>>>>> Am 23.03.2010 um 10:31 schrieb dominik:
>>>>>> There are a lot qlogins, i can see the unix processes but i can't see the jobs via qstat. can it be a permission problem or something else with the qmaster?
>>>>> And they are all kids of an execd and shepherd on the exechosts? Are you contacting the right qmaster, I mean: in principle it's possible to have more than one qmaster (which might overload the exechosts) and as they don't know anything from other instances you can only see the jobs of the actual one. Indication would be, that there is more than one execd on the nodes.
>>>> I figured it out it isn't a qstat problem. shepherd died, the only log message is
>>>> 03/23/2010 14:59:31|worker|vnc|W|job 37529.1 failed on host XX assumedly after job because: job 37529.1 died through signal KILL (9)
>>>> in the master messages logfile.. but i can't trace the shepherd process it die to fast...
>>> You can set:
>>> $ qconf -mconf
>>> execd_params ENABLE_ADDGRP_KILL=TRUE
>>> to kill also the processes which jump out of the process tree (default is a `kill -9 -- -pgrp` otherwise). Then the qlogin's should also be gone.
>>> But the qlogin's startup fine?
>> Yes, i got a shell and i don't notice about shepherd die..
>> The problem is only on interactive sessions, batch jobs run without problems.
> Which startup method do you use for qlogin: -butilin-, rsh, ssh or tight ssh? I.e. the relevant entriey in `qconf -sconf` for qlogin (there maybe a global configuration and local ones for each node).
We use ssh global configured:
qlogin_daemon /usr/sbin/sshd -i
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users