[GE users] can't get password entry for user

Andy Schwierskott andy.schwierskott at sun.com
Fri Mar 3 08:21:33 GMT 2006


Rayson,

> The code doesn't give up immediately, but it keeps on retrying...
> isn't it better if it waits a bit after each retry??

good question - probably it would be a better idea to wait a bit. Perhaps it
wasn't implemented this way because sleep() has a coarse granularity and sub
seconds sleeps require additional effort to be implemented.

I agree however that error handling is meager. It would be better if errno
were available for the calling functions to allow better error logging.

Andy


>
> Rayson
>
>
>
> On 3/2/06, Andy Schwierskott <andy.schwierskott at sun.com> wrote:
>> As you see from the sge_passwd() code it doesn't give up immediately, so the
>> issue is really in the NFS area.
>>
>> Andy
>>
>>> Hi Jinal,
>>>
>>>       We have seen these messages occasionally in our environment.
>>> They typically indicate very slow
>>> response from an NIS server, either due to other network traffic, or the
>>> server itself is overloaded for some other reason.
>>>
>>> HTH,
>>>
>>> Mac McCalla
>>>
>>> -----Original Message-----
>>> From: Jinal Jhaveri [mailto:jajhaveri at lbl.gov]
>>> Sent: Thursday, March 02, 2006 11:42 AM
>>> To: users at gridengine.sunsource.net
>>> Subject: [GE users] can't get password entry for user
>>>
>>> Hi All,
>>>
>>> From few days, were are seeing following errors for few jobs.
>>>
>>> can't get password entry for user "kfelkins". Either the user does not
>>> exist or NIS error!
>>> error reason  453:          can't get password entry for user
>>> "kfelkins". Either the user does not exist or NIS error!
>>> error reason  454:          can't get password entry for user
>>> "kfelkins". Either the user does not exist or NIS error!
>>>
>>> and thus leading the node in error state.
>>>
>>> The error doesn't happen for a particular user but randomly for various
>>> users. I didn't get much information from the execd code except that
>>> sge_getpwnam , which in turn calls getpwnam , fails. sge_getpwnam
>>> doesn't specifically tell what error did it receive from getpwnam. I saw
>>>
>>> several emails in the group, but none of those situation apply to us.
>>> The user is definitely configured correctly on that node and this error
>>> happens only randomly.  Also qconf -suserl does show the name of that
>>> user.
>>>
>>> Any help would be sincerely appreciated
>>>
>>> thank you
>>>
>>>
>>> --Jinal
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list