[GE users] can't resolve group

SLIM H.A. h.a.slim at durham.ac.uk
Sat Sep 8 10:33:16 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello
 
This has already been raised in the thread "cannot submit job because of error bug?"
 
It is in issue #2249 and been reopened. It is slated for update 6.1u3, I also sent a  comment to the developper list as this causes serious problems for users. I have a work around on my system as explained below, otherwise you may have to roll back to a previous version unless Rayson has another solution.
 
Henk
 
This is from earlier emails ....
 
Dmitry
 
The low level function calls like getgid etc get their information via /etc/nsswitch.conf
If that file has an entry like
group:      files nis
it will first look in the file.
 
Best wishes
 
Henk


________________________________

	From: Dmitry Zhukovski [mailto:DZH at maerskoil.com] 
	Sent: 03 September 2007 08:41
	To: users at gridengine.sunsource.net
	Subject: RE: [GE users] RE: cannot submit job because of error bug?
	
	
	Hi Henk,
	 
	  I assume it wont work with ldap users?
	 
	br,
	dmitry

________________________________

	From: SLIM H.A. [mailto:h.a.slim at durham.ac.uk] 
	Sent: 01 September 2007 01:50
	To: users at gridengine.sunsource.net
	Subject: RE: [GE users] RE: cannot submit job because of error bug?
	
	
	Our system runs suse 10, kernel 2.6.18. I also found the 1024 character limit problem. 
	 
	However we may have found a workaround, adding the offending primary group to /etc/group but with the user list split over two or more lines by repeating the three initial fields, like
	 
	name:!:gid:additional_users
	name:!:gid:more_additional_users
	 
	According to A Frisch, Essential System Administration, 3ed, p230 this is allowed although the grpck command complains about duplicate entries. For that reason we should get a better solution than this as there may be side effects. 
	 
	Henk
	 
________________________________

	From: Xinghong He [mailto:hexinghong at gmail.com]
	Sent: Fri 8/31/2007 9:21 PM
	To: users at gridengine.sunsource.net
	Subject: Re: [GE users] RE: cannot submit job because of error bug?
	
	

	It looks like a problem of the new version with Red Hat. All my RH systems
	have the same problem (RH on Intel and RH on AMD) and all SUSE work fine.
	Xinghong
	
	----- Original Message -----
	From: <Andreas.Haas at Sun.COM>
	To: <users at gridengine.sunsource.net>
	Sent: Friday, August 31, 2007 9:35 AM
	Subject: RE: [GE users] RE: cannot submit job because of error bug?
	
	
	> Hi Dmitry,
	>
	> On Fri, 31 Aug 2007, Dmitry Zhukovski wrote:
	>
	>> Hi all,
	>>
	>>  I moved a bit further. I found issue #2249 regarding too long group
	>> entries. The fix contains replacing of hard coded constant by system
	>> call sysconf(SC_GETGR_R_SIZE_MAX). It's fine but in my case it gives
	>> 1024 and that's too less than required for my groups to be resolved.
	>> Whenever I increase this value(and allocated buffer) by let say 10 times
	>> - qstat works!
	>
	> What a mess! sysconf(SC_GETGR_R_SIZE_MAX) had better returned -1 instead
	> of 1024, then our 20k buffer size had been in effect:
	>
	>    int get_group_buffer_size(void)
	>    {
	>       enum { buf_size = 20480 };  /* default is 20 KB */
	>
	>       int sz = buf_size;
	>
	>    #ifdef _SC_GETGR_R_SIZE_MAX
	>       if ((sz = (int)sysconf(_SC_GETGR_R_SIZE_MAX)) == -1) {
	>          sz = buf_size;
	>       }
	>    #endif
	>
	>       return sz;
	>    }
	>
	> according POSIX sysconf(SC_GETGR_R_SIZE_MAX) is supposed to return the
	> "maximum size needed for this buffer"!
	>
	>>
	>>  Is it possible to increase that system value by sysctl or anything
	>> else? Or I have to compile my own version SGE?
	>
	> My expectation is no manual tuning were required for this, but maybe
	> I'm wrong.
	>
	> Actually what OS is this? Wasn't it Red Hat?
	>
	> Regards,
	> Andreas
	>
	>>
	>> Br,
	>> dmitry
	>>
	>> -----Original Message-----
	>> From: Dmitry Zhukovski [mailto:DZH at maerskoil.com]
	>> Sent: 30. august 2007 13:22
	>> To: users at gridengine.sunsource.net
	>> Subject: RE: [GE users] RE: cannot submit job because of error bug?
	>>
	>> Hi all,
	>>
	>>  Henk gave me good idea - to find limitation for number of users per
	>> group. An hour of adding and removing of users from test group gave me
	>> number 120! qstat and qdel doesn't complain anymore about not resolved
	>> group.
	>>
	>>  If there are any developers here - why is such limitation and is it
	>> possible to increase it? One of my primary groups contains more than 200
	>> users.
	>>
	>> Br,
	>> dmitry
	>>
	>> -----Original Message-----
	>> From: SLIM H.A. [mailto:h.a.slim at durham.ac.uk]
	>> Sent: 29. august 2007 17:14
	>> To: users at gridengine.sunsource.net
	>> Subject: RE: [GE users] RE: cannot submit job because of error bug?
	>>
	>>
	>> In my case there are actually two users (who both have a primary group)
	>> with this problem.
	>>
	>> Indeed the first user is in a secondary group that also contains users
	>> without primary group which our systems people should fix.
	>>
	>> The second user's primary group is also a secondary group but all users
	>> listed for that secondary group can be identified with the id command
	>> and all have a primary group.
	>>
	>> So I don't thing users without primary group listed in a secondary group
	>> is necessarily the problem.
	>> Also this problem was not in version 6.0u7 from which I upgraded to 6.1.
	>>
	>>
	>> Of course I don't want to ask the standard question "has anything been
	>> changed?" that every sysadmin is badgered with when something is
	>> suddenly not working anymore but if I shorten the list of users in the
	>> secondary group, the commands do work again.
	>>
	>> I compared the output from qstat v6.1 with that of v6.0u7 for level 10.
	>> There are 3 lines from qstat v6.1 that signal an error:
	>>    63  17496 47463950073600 --> sge_log() {
	>>    64  17496 47463950073600     sge_log: ctx is NULL
	>>    65  17496 47463950073600     ../libs/sgeobj/sge_answer.c 937 can't
	>> resolve group
	>>
	>> whereas v6.07 finishes with
	>>
	>> error: can't unpack gdi request
	>> error: error unpacking gdi request: bad argument
	>> failed receiving gdi request
	>>
	>> and with debug level 10 it prints the uid and gid of the user.
	>>
	>> Best wishes
	>>
	>> Henk
	>>
	>>> -----Original Message-----
	>>> From: Dmitry Zhukovski [mailto:DZH at maerskoil.com]
	>>> Sent: 29 August 2007 14:46
	>>> To: users at gridengine.sunsource.net
	>>> Subject: RE: [GE users] RE: cannot submit job because of error bug?
	>>>
	>>> Hi all,
	>>>
	>>>   I have exactly same output for one of my users - qstat,
	>>> qdel, qsub and other gives 'can't resolve group'.
	>>>
	>>>   A little bit of googling gave me next issue
	>>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=1256 <https://exdurf.dur.ac.uk/exchweb/bin/redir.asp?URL=http://gridengine.sunsource.net/issues/show_bug.cgi?id=1256>  .
	>>> I checked and found user primary group's ID was not listed in
	>>> ldap set of groups.
	>>> So search on that user gave me list of all slave groups he
	>>> belongs to but not primary.
	>>>
	>>>   I added primary group but still get 'can't resolve group' message.
	>>> Question - can it be cached somewhere?
	>>>
	>>> Br,
	>>> dmitry
	>>>
	>>> -----Original Message-----
	>>> From: SLIM H.A. [mailto:h.a.slim at durham.ac.uk]
	>>> Sent: 29. august 2007 11:40
	>>> To: users at gridengine.sunsource.net
	>>> Subject: RE: [GE users] RE: cannot submit job because of error bug?
	>>>
	>>> Dear Daniel
	>>>
	>>> I have set dl 4 and attach the output. I had a look at the
	>>> source, is it possible to build qstat by itself for debug purpose?
	>>>
	>>> Thanks
	>>>
	>>> Henk
	>>>
	>>>> -----Original Message-----
	>>>> From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM]
	>>>> Sent: 28 August 2007 19:07
	>>>> To: users at gridengine.sunsource.net
	>>>> Subject: Re: [GE users] RE: cannot submit job because of error bug?
	>>>>
	>>>> The debug levels aren't monotonic.  10 is actually less information
	>>>> than some lower levels.  4 might give you more info.  See:
	>>>>
	>>>> http://blogs.sun.com/templedf/entry/using_debugging_output <https://exdurf.dur.ac.uk/exchweb/bin/redir.asp?URL=http://blogs.sun.com/templedf/entry/using_debugging_output> 
	>>>>
	>>>> Daniel
	>>>>
	>>>> SLIM H.A. wrote:
	>>>>> Further information to the failure of the sge commands for
	>>>> some unix
	>>>>> groups of users.
	>>>>>
	>>>>> Setting the debug level to 10 and running the qstat command
	>>>> gives for
	>>>>> the last few lines of stdout:
	>>>>>
	>>>>>     63  15359 47241863851776 --> sge_log() {
	>>>>>     64  15359 47241863851776     sge_log: ctx is NULL
	>>>>>     65  15359 47241863851776
	>>>> ../libs/sgeobj/sge_answer.c 937 can't
	>>>>> resolve group
	>>>>>
	>>>>> I attached the full debug output.
	>>>>>
	>>>>> Thanks
	>>>>>
	>>>>> Henk
	>>>>>
	>>>>>
	>>>>>> -----Original Message-----
	>>>>>> From: SLIM H.A.
	>>>>>> Sent: 28 August 2007 16:46
	>>>>>> To: SLIM H.A.
	>>>>>> Subject: cannot submit job because of error bug?
	>>>>>>
	>>>>>>
	>>>>>>
	>>>>>> Some users are unable to submit jobs under sge 6.1. The
	>>>> error message
	>>>>>> is this:
	>>>>>>
	>>>>>> % qsub
	>>>>>> Unable to initialize environment because of error: can't resolve
	>>>>>> group
	>>>>>>
	>>>>>
	>>>>>
	>>>>>> Exiting.
	>>>>>>
	>>>>>>
	>>>>>> It appears that a limit is hit by the grid engine commands when
	>>>>>> reading one of the secondary group entries in the
	>>>> /etc/group file. It
	>>>>>> seems the commands cannot process lines that have more than some
	>>>>>> small
	>>>>>>
	>>>>>
	>>>>>
	>>>>>> number of charcters, probably 512.
	>>>>>> Any userid that has that particular offending secondary
	>>>> group as its
	>>>>>> primary group cannot submit jobs.
	>>>>>>
	>>>>>> When the number of userids for the offending secondary group is
	>>>>>> reduced, the userid is able to submit again.
	>>>>>>
	>>>>>> Is this a bug as 6.0u7 did not have this problem?
	>>>>>>
	>>>>>> Thanks for any advice
	>>>>>>
	>>>>>>
	>>>>>> Henk
	>>>>>>
	>>>>>>
	>>>>>>> -----Original Message-----
	>>>>>>> From: SLIM H.A.
	>>>>>>> Sent: 28 August 2007 11:34
	>>>>>>> To: 'users at gridengine.sunsource.net'
	>>>>>>> Subject: RE: [GE users] 6.1: critical error: can't resolve group
	>>>>>>>
	>>>>>>> Chris,
	>>>>>>>
	>>>>>>> I tried this, it seems to be ok:
	>>>>>>>
	>>>>>>> # grpck
	>>>>>>> Checking `/etc/group'
	>>>>>>>
	>>>>>>> is the only response I get
	>>>>>>>
	>>>>>>> Thanks
	>>>>>>>
	>>>>>>> Henk
	>>>>>>>
	>>>>>>>
	>>>>>>>
	>>>>>>>
	>>>>>>>> -----Original Message-----
	>>>>>>>> From: chris.harwell at novartis.com
	>>>>>>>>
	>>>>>> [mailto:chris.harwell at novartis.com]
	>>>>>>
	>>>>>>>> Sent: 28 August 2007 11:03
	>>>>>>>> To: users
	>>>>>>>> Subject: Re: [GE users] 6.1: critical error: can't
	>>> resolve group
	>>>>>>>>
	>>>>>>>> Try running grpck as root.
	>>>>>>>>
	>>>>>>>>
	>>>>>>>>
	>>>>>>>> ----- Original Message -----
	>>>>>>>> From: "SLIM H.A." [h.a.slim at durham.ac.uk]
	>>>>>>>> Sent: 08/28/2007 04:56 AM
	>>>>>>>> To: <users at gridengine.sunsource.net>
	>>>>>>>> Subject: [GE users] 6.1: critical error: can't resolve group
	>>>>>>>>
	>>>>>>>>
	>>>>>>>> I just upgraded from 6.0u7 to 6.1 and have come across a
	>>>>>>>>
	>>>>>>> problem. The
	>>>>>>>
	>>>>>>>> Grid Engine commands now give for some users an error,
	>>>> for example
	>>>>>>>>
	>>>>>>>> %qstat
	>>>>>>>> critical error: can't resolve group
	>>>>>>>>
	>>>>>>>> Has anyone seen this before or have an idea why this now
	>>>> shows up?
	>>>>>>>>
	>>>>>>>> Thanks
	>>>>>>>>
	>>>>>>>> Henk
	>>>>>>>>
	>>>>>>>>
	>>>>>>>>
	>>>>>>
	>>>>
	>>> ---------------------------------------------------------------------
	>>>>>>
	>>>>>>>> To unsubscribe, e-mail:
	>>>> users-unsubscribe at gridengine.sunsource.net
	>>>>>>>> For additional commands, e-mail:
	>>>>>>>>
	>>>>>> users-help at gridengine.sunsource.net
	>>>>>>
	>>>>>>>>
	>>>>>>
	>>>>
	>>> ---------------------------------------------------------------------
	>>>>>>
	>>>>>>>> To unsubscribe, e-mail:
	>>>> users-unsubscribe at gridengine.sunsource.net
	>>>>>>>> For additional commands, e-mail:
	>>>>>>>>
	>>>>>> users-help at gridengine.sunsource.net
	>>>>>>
	>>>>>>>>
	>>>>>>>>
	>>>> -------------------------------------------------------------------
	>>>>>>>> -----
	>>>>>>>>
	>>>>>>>>
	>>>> -------------------------------------------------------------------
	>>>>>>>> -- To unsubscribe, e-mail:
	>>>>>>>> users-unsubscribe at gridengine.sunsource.net
	>>>>>>>> For additional commands, e-mail:
	>>>>>>>> users-help at gridengine.sunsource.net
	>>>>
	>>>>
	>>> ---------------------------------------------------------------------
	>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
	>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
	>>>>
	>>>>
	>>>
	>>> **********************************************************************
	>>> This e-mail and any files transmitted with it are
	>>> confidential and intended solely for the use of the
	>>> individual or entity to which they are addressed. If you have
	>>> received this e-mail in error please notify the system
	>>> manager at helpdesk at maerskoil.com.
	>>>
	>>> This e-mail and its contents do not constitute and shall not
	>>> be considered as a financial commitment of Maersk Olie og Gas
	>>> AS and its affiliates.
	>>> Maersk Olie og Gas AS expressly disclaims any responsibility
	>>> as to the accuracy and use of this e-mail and its contents.
	>>> **********************************************************************
	>>>
	>>> ---------------------------------------------------------------------
	>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
	>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
	>>>
	>>>
	>>
	>> ---------------------------------------------------------------------
	>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
	>> For additional commands, e-mail: users-help at gridengine.sunsource.net
	>>
	>>
	>> **********************************************************************
	>> This e-mail and any files transmitted with it are confidential and
	>> intended solely for the use of the individual or entity to which they
	>> are addressed. If you have received this e-mail in error please notify
	>> the system manager at helpdesk at maerskoil.com.
	>>
	>> This e-mail and its contents do not constitute and shall not be
	>> considered as a financial commitment of Maersk Olie og Gas AS
	>> and its affiliates.
	>> Maersk Olie og Gas AS expressly disclaims any responsibility
	>> as to the accuracy and use of this e-mail and its contents.
	>> **********************************************************************
	>>
	>> ---------------------------------------------------------------------
	>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
	>> For additional commands, e-mail: users-help at gridengine.sunsource.net
	>>
	>>
	>> **********************************************************************
	>> This e-mail and any files transmitted with it are confidential and
	>> intended solely for the use of the individual or entity to which they
	>> are addressed. If you have received this e-mail in error please notify
	>> the system manager at helpdesk at maerskoil.com.
	>>
	>> This e-mail and its contents do not constitute and shall not be
	>> considered as a financial commitment of Maersk Olie og Gas AS
	>> and its affiliates.
	>> Maersk Olie og Gas AS expressly disclaims any responsibility
	>> as to the accuracy and use of this e-mail and its contents.
	>> **********************************************************************
	>>
	>> ---------------------------------------------------------------------
	>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
	>> For additional commands, e-mail: users-help at gridengine.sunsource.net
	>>
	>>
	>
	> http://gridengine.info/ <https://exdurf.dur.ac.uk/exchweb/bin/redir.asp?URL=http://gridengine.info/> 
	>
	> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
	> Kirchheim-Heimstetten
	> Amtsgericht Muenchen: HRB 161028
	> Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
	> Vorsitzender des Aufsichtsrates: Martin Haering
	>
	> ---------------------------------------------------------------------
	> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
	> For additional commands, e-mail: users-help at gridengine.sunsource.net
	>
	
	---------------------------------------------------------------------
	To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
	For additional commands, e-mail: users-help at gridengine.sunsource.net
	
	


________________________________

	This e-mail and any files transmitted with it are confidential and 
	intended solely for the use of the individual or entity to which they 
	are addressed. If you have received this e-mail in error please notify 
	the system manager at helpdesk at maerskoil.com. 
	
	This e-mail and its contents do not constitute and shall not be 
	considered as a financial commitment of Maersk Olie og Gas AS 
	and its affiliates. 
	Maersk Olie og Gas AS expressly disclaims any responsibility 
	as to the accuracy and use of this e-mail and its contents. 
________________________________




________________________________

From: A listner [mailto:gg3796 at yahoo.com]
Sent: Fri 9/7/2007 10:37 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] can't resolve group



Most strange is I am another installation 6.0u4 which was upgraded to u9 and working fine in the same network.

-S



----- Original Message ----
From: A listner <gg3796 at yahoo.com>
To: users at gridengine.sunsource.net
Sent: Friday, September 7, 2007 2:35:32 PM
Subject: Re: [GE users] can't resolve group


Yes I did,  I can see  all my group subscriptions. I have 12 subscriptions . All using NIS.

Thanks,
-S
 


----- Original Message ----
From: Rayson Ho <rayrayson at gmail.com>
To: users at gridengine.sunsource.net
Sent: Friday, September 7, 2007 11:45:01 AM
Subject: Re: [GE users] can't resolve group


Have you tried the "id" command??

Rayson



On 9/7/07, A listner <gg3796 at yahoo.com> wrote:
>
>
> I just installed 6_1u2 and I am getting following error
>
> "critical error: can't resolve group"
>
> Is this a bug? Do we have a fix  for it?
>  I have 6.0u4 running without this problem.
>
>
> thanks
>
>
>  ________________________________
> Be a better Globetrotter. Get better travel answers from someone who knows.
> Yahoo! Answers - Check it out.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




________________________________

Building a website is a piece of cake. 
Yahoo! Small Business gives you all the tools to get online. <http://us.rd.yahoo.com/evt=48251/*http://smallbusiness.yahoo.com/webhosting/?p=PASSPORTPLUS> 


________________________________

Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel <http://us.rd.yahoo.com/evt=48516/*http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7>  and lay it on us. 



More information about the gridengine-users mailing list