[GE users] can't get password entry for user XXXX ...

craffi dag at sonsorol.org
Fri Sep 25 19:43:55 BST 2009


Is this a Mac OS X system?

-Chris




On Sep 25, 2009, at 6:41 AM, txema_heredia wrote:

> Hi folks!
>
> I have a problem with my sge6.1u4 cluster:
>
> Twice this week, two of my hosts have started to put any job that  
> was submitted to them in Error state, reporting this:
>
> error reason    1:          09/25/2009 11:07:45 [0:30915]: can't get  
> password entry for user "XXXXXXXXXX". Either the user does not exist  
> or NIS error!
>
> And after that, the queue instance went in Error state.
>
>
> I have searched for this problem and the only answers were "You have  
> a problem with your users/NIS/LDAP" or "restart sgeexecd in the host".
>
> This error message is not true. I've ssh'd to that host using that  
> username, and everything was working (user, password, home, ...) OK,  
> so I tried the other option. I stopped and started again the  
> sgeexecd in that host and now, the jobs no longer enter in error  
> state, but they finish unexpectedly without any reason.
>
> This is the qacct output:
>
> failed       100 : assumedly after job
>
> and if I submit them with "-m a" option, I get a mail like this:
>
> Job 740641 (med-19) Aborted
> Exit Status      = 134
> Signal           = ABRT
> User             = XXXXXXXXX
> Queue            = test2-med at compute-0-4.local
> Host             = compute-0-4.local
> Start Time       = 09/25/2009 11:27:31
> End Time         = 09/25/2009 11:27:31
> CPU              = NA
> Max vmem         = NA
> failed assumedly after job because:
> job 740641.1 died through signal ABRT (6)
>
>
> This same "can't get password" thing has happened several times in  
> our cluster, but most of them solved it "magically" after a few  
> time. But the last time it happened (last Tuesday), I got to delete  
> the host from the host groups which it belong, and reinstall the  
> host (is a rocks cluster 5.0 host, just restarting the daemon didn't  
> work) before it worked again.
>
> Any suggestion?
>
> thanks in advice,
>
> Txema
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=219021
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=219077

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list