[GE users] qstat don't list all users

reuti reuti at staff.uni-marburg.de
Tue Mar 23 14:31:34 GMT 2010


Am 23.03.2010 um 15:22 schrieb dominik:

> On 23.03.2010, at 12:25, reuti wrote:
> 
>> Am 23.03.2010 um 10:31 schrieb dominik:
>> 
>>> There are a lot qlogins, i can see the unix processes but i can't see the jobs via qstat. can it be a permission problem or something else with the qmaster? 
>> 
>> And they are all kids of an execd and shepherd on the exechosts? Are you contacting the right qmaster, I mean: in principle it's possible to have more than one qmaster (which might overload the exechosts) and as they don't know anything from other instances you can only see the jobs of the actual one. Indication would be, that there is more than one execd on the nodes.
> 
> I figured it out it isn't a qstat problem. shepherd died, the only log message is
> 03/23/2010 14:59:31|worker|vnc|W|job 37529.1 failed on host XX assumedly after job because: job 37529.1 died through signal KILL (9)
> in the master messages logfile.. but i can't trace the shepherd process it die to fast... 

You can set:

$ qconf -mconf
...
execd_params                 ENABLE_ADDGRP_KILL=TRUE

to kill also the processes which jump out of the process tree (default is a `kill -9 -- -pgrp` otherwise). Then the qlogin's should also be gone.

But the qlogin's startup fine?

-- Reuti


> Cheers, Dominik
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250823
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250828

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list