[GE users] RE: cannot submit job because of error bug?

SLIM H.A. h.a.slim at durham.ac.uk
Mon Sep 3 11:57:41 BST 2007


Dmitry
 
The low level function calls like getgid etc get their information via
/etc/nsswitch.conf
If that file has an entry like
group:      files nis
it will first look in the file.
 
Best wishes
 
Henk


________________________________

	From: Dmitry Zhukovski [mailto:DZH at maerskoil.com] 
	Sent: 03 September 2007 08:41
	To: users at gridengine.sunsource.net
	Subject: RE: [GE users] RE: cannot submit job because of error
bug?
	
	
	Hi Henk,
	 
	  I assume it wont work with ldap users?
	 
	br,
	dmitry

________________________________

	From: SLIM H.A. [mailto:h.a.slim at durham.ac.uk] 
	Sent: 01 September 2007 01:50
	To: users at gridengine.sunsource.net
	Subject: RE: [GE users] RE: cannot submit job because of error
bug?
	
	
	Our system runs suse 10, kernel 2.6.18. I also found the 1024
character limit problem. 
	 
	However we may have found a workaround, adding the offending
primary group to /etc/group but with the user list split over two or
more lines by repeating the three initial fields, like
	 
	name:!:gid:additional_users
	name:!:gid:more_additional_users
	 
	According to A Frisch, Essential System Administration, 3ed,
p230 this is allowed although the grpck command complains about
duplicate entries. For that reason we should get a better solution than
this as there may be side effects. 
	 
	Henk
	 
________________________________

	From: Xinghong He [mailto:hexinghong at gmail.com]
	Sent: Fri 8/31/2007 9:21 PM
	To: users at gridengine.sunsource.net
	Subject: Re: [GE users] RE: cannot submit job because of error
bug?
	
	

	It looks like a problem of the new version with Red Hat. All my
RH systems
	have the same problem (RH on Intel and RH on AMD) and all SUSE
work fine.
	Xinghong
	
	----- Original Message -----
	From: <Andreas.Haas at Sun.COM>
	To: <users at gridengine.sunsource.net>
	Sent: Friday, August 31, 2007 9:35 AM
	Subject: RE: [GE users] RE: cannot submit job because of error
bug?
	
	
	> Hi Dmitry,
	>
	> On Fri, 31 Aug 2007, Dmitry Zhukovski wrote:
	>
	>> Hi all,
	>>
	>>  I moved a bit further. I found issue #2249 regarding too
long group
	>> entries. The fix contains replacing of hard coded constant by
system
	>> call sysconf(SC_GETGR_R_SIZE_MAX). It's fine but in my case
it gives
	>> 1024 and that's too less than required for my groups to be
resolved.
	>> Whenever I increase this value(and allocated buffer) by let
say 10 times
	>> - qstat works!
	>
	> What a mess! sysconf(SC_GETGR_R_SIZE_MAX) had better returned
-1 instead
	> of 1024, then our 20k buffer size had been in effect:
	>
	>    int get_group_buffer_size(void)
	>    {
	>       enum { buf_size = 20480 };  /* default is 20 KB */
	>
	>       int sz = buf_size;
	>
	>    #ifdef _SC_GETGR_R_SIZE_MAX
	>       if ((sz = (int)sysconf(_SC_GETGR_R_SIZE_MAX)) == -1) {
	>          sz = buf_size;
	>       }
	>    #endif
	>
	>       return sz;
	>    }
	>
	> according POSIX sysconf(SC_GETGR_R_SIZE_MAX) is supposed to
return the
	> "maximum size needed for this buffer"!
	>
	>>
	>>  Is it possible to increase that system value by sysctl or
anything
	>> else? Or I have to compile my own version SGE?
	>
	> My expectation is no manual tuning were required for this, but
maybe
	> I'm wrong.
	>
	> Actually what OS is this? Wasn't it Red Hat?
	>
	> Regards,
	> Andreas
	>
	>>
	>> Br,
	>> dmitry
	>>
	>> -----Original Message-----
	>> From: Dmitry Zhukovski [mailto:DZH at maerskoil.com]
	>> Sent: 30. august 2007 13:22
	>> To: users at gridengine.sunsource.net
	>> Subject: RE: [GE users] RE: cannot submit job because of
error bug?
	>>
	>> Hi all,
	>>
	>>  Henk gave me good idea - to find limitation for number of
users per
	>> group. An hour of adding and removing of users from test
group gave me
	>> number 120! qstat and qdel doesn't complain anymore about not
resolved
	>> group.
	>>
	>>  If there are any developers here - why is such limitation
and is it
	>> possible to increase it? One of my primary groups contains
more than 200
	>> users.
	>>
	>> Br,
	>> dmitry
	>>
	>> -----Original Message-----
	>> From: SLIM H.A. [mailto:h.a.slim at durham.ac.uk]
	>> Sent: 29. august 2007 17:14
	>> To: users at gridengine.sunsource.net
	>> Subject: RE: [GE users] RE: cannot submit job because of
error bug?
	>>
	>>
	>> In my case there are actually two users (who both have a
primary group)
	>> with this problem.
	>>
	>> Indeed the first user is in a secondary group that also
contains users
	>> without primary group which our systems people should fix.
	>>
	>> The second user's primary group is also a secondary group but
all users
	>> listed for that secondary group can be identified with the id
command
	>> and all have a primary group.
	>>
	>> So I don't thing users without primary group listed in a
secondary group
	>> is necessarily the problem.
	>> Also this problem was not in version 6.0u7 from which I
upgraded to 6.1.
	>>
	>>
	>> Of course I don't want to ask the standard question "has
anything been
	>> changed?" that every sysadmin is badgered with when something
is
	>> suddenly not working anymore but if I shorten the list of
users in the
	>> secondary group, the commands do work again.
	>>
	>> I compared the output from qstat v6.1 with that of v6.0u7 for
level 10.
	>> There are 3 lines from qstat v6.1 that signal an error:
	>>    63  17496 47463950073600 --> sge_log() {
	>>    64  17496 47463950073600     sge_log: ctx is NULL
	>>    65  17496 47463950073600     ../libs/sgeobj/sge_answer.c
937 can't
	>> resolve group
	>>
	>> whereas v6.07 finishes with
	>>
	>> error: can't unpack gdi request
	>> error: error unpacking gdi request: bad argument
	>> failed receiving gdi request
	>>
	>> and with debug level 10 it prints the uid and gid of the
user.
	>>
	>> Best wishes
	>>
	>> Henk
	>>
	>>> -----Original Message-----
	>>> From: Dmitry Zhukovski [mailto:DZH at maerskoil.com]
	>>> Sent: 29 August 2007 14:46
	>>> To: users at gridengine.sunsource.net
	>>> Subject: RE: [GE users] RE: cannot submit job because of
error bug?
	>>>
	>>> Hi all,
	>>>
	>>>   I have exactly same output for one of my users - qstat,
	>>> qdel, qsub and other gives 'can't resolve group'.
	>>>
	>>>   A little bit of googling gave me next issue
	>>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=1256
.
	>>> I checked and found user primary group's ID was not listed
in
	>>> ldap set of groups.
	>>> So search on that user gave me list of all slave groups he
	>>> belongs to but not primary.
	>>>
	>>>   I added primary group but still get 'can't resolve group'
message.
	>>> Question - can it be cached somewhere?
	>>>
	>>> Br,
	>>> dmitry
	>>>
	>>> -----Original Message-----
	>>> From: SLIM H.A. [mailto:h.a.slim at durham.ac.uk]
	>>> Sent: 29. august 2007 11:40
	>>> To: users at gridengine.sunsource.net
	>>> Subject: RE: [GE users] RE: cannot submit job because of
error bug?
	>>>
	>>> Dear Daniel
	>>>
	>>> I have set dl 4 and attach the output. I had a look at the
	>>> source, is it possible to build qstat by itself for debug
purpose?
	>>>
	>>> Thanks
	>>>
	>>> Henk
	>>>
	>>>> -----Original Message-----
	>>>> From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM]
	>>>> Sent: 28 August 2007 19:07
	>>>> To: users at gridengine.sunsource.net
	>>>> Subject: Re: [GE users] RE: cannot submit job because of
error bug?
	>>>>
	>>>> The debug levels aren't monotonic.  10 is actually less
information
	>>>> than some lower levels.  4 might give you more info.  See:
	>>>>
	>>>> http://blogs.sun.com/templedf/entry/using_debugging_output
	>>>>
	>>>> Daniel
	>>>>
	>>>> SLIM H.A. wrote:
	>>>>> Further information to the failure of the sge commands for
	>>>> some unix
	>>>>> groups of users.
	>>>>>
	>>>>> Setting the debug level to 10 and running the qstat
command
	>>>> gives for
	>>>>> the last few lines of stdout:
	>>>>>
	>>>>>     63  15359 47241863851776 --> sge_log() {
	>>>>>     64  15359 47241863851776     sge_log: ctx is NULL
	>>>>>     65  15359 47241863851776
	>>>> ../libs/sgeobj/sge_answer.c 937 can't
	>>>>> resolve group
	>>>>>
	>>>>> I attached the full debug output.
	>>>>>
	>>>>> Thanks
	>>>>>
	>>>>> Henk
	>>>>>
	>>>>>
	>>>>>> -----Original Message-----
	>>>>>> From: SLIM H.A.
	>>>>>> Sent: 28 August 2007 16:46
	>>>>>> To: SLIM H.A.
	>>>>>> Subject: cannot submit job because of error bug?
	>>>>>>
	>>>>>>
	>>>>>>
	>>>>>> Some users are unable to submit jobs under sge 6.1. The
	>>>> error message
	>>>>>> is this:
	>>>>>>
	>>>>>> % qsub
	>>>>>> Unable to initialize environment because of error: can't
resolve
	>>>>>> group
	>>>>>>
	>>>>>
	>>>>>
	>>>>>> Exiting.
	>>>>>>
	>>>>>>
	>>>>>> It appears that a limit is hit by the grid engine
commands when
	>>>>>> reading one of the secondary group entries in the
	>>>> /etc/group file. It
	>>>>>> seems the commands cannot process lines that have more
than some
	>>>>>> small
	>>>>>>
	>>>>>
	>>>>>
	>>>>>> number of charcters, probably 512.
	>>>>>> Any userid that has that particular offending secondary
	>>>> group as its
	>>>>>> primary group cannot submit jobs.
	>>>>>>
	>>>>>> When the number of userids for the offending secondary
group is
	>>>>>> reduced, the userid is able to submit again.
	>>>>>>
	>>>>>> Is this a bug as 6.0u7 did not have this problem?
	>>>>>>
	>>>>>> Thanks for any advice
	>>>>>>
	>>>>>>
	>>>>>> Henk
	>>>>>>
	>>>>>>
	>>>>>>> -----Original Message-----
	>>>>>>> From: SLIM H.A.
	>>>>>>> Sent: 28 August 2007 11:34
	>>>>>>> To: 'users at gridengine.sunsource.net'
	>>>>>>> Subject: RE: [GE users] 6.1: critical error: can't
resolve group
	>>>>>>>
	>>>>>>> Chris,
	>>>>>>>
	>>>>>>> I tried this, it seems to be ok:
	>>>>>>>
	>>>>>>> # grpck
	>>>>>>> Checking `/etc/group'
	>>>>>>>
	>>>>>>> is the only response I get
	>>>>>>>
	>>>>>>> Thanks
	>>>>>>>
	>>>>>>> Henk
	>>>>>>>
	>>>>>>>
	>>>>>>>
	>>>>>>>
	>>>>>>>> -----Original Message-----
	>>>>>>>> From: chris.harwell at novartis.com
	>>>>>>>>
	>>>>>> [mailto:chris.harwell at novartis.com]
	>>>>>>
	>>>>>>>> Sent: 28 August 2007 11:03
	>>>>>>>> To: users
	>>>>>>>> Subject: Re: [GE users] 6.1: critical error: can't
	>>> resolve group
	>>>>>>>>
	>>>>>>>> Try running grpck as root.
	>>>>>>>>
	>>>>>>>>
	>>>>>>>>
	>>>>>>>> ----- Original Message -----
	>>>>>>>> From: "SLIM H.A." [h.a.slim at durham.ac.uk]
	>>>>>>>> Sent: 08/28/2007 04:56 AM
	>>>>>>>> To: <users at gridengine.sunsource.net>
	>>>>>>>> Subject: [GE users] 6.1: critical error: can't resolve
group
	>>>>>>>>
	>>>>>>>>
	>>>>>>>> I just upgraded from 6.0u7 to 6.1 and have come across
a
	>>>>>>>>
	>>>>>>> problem. The
	>>>>>>>
	>>>>>>>> Grid Engine commands now give for some users an error,
	>>>> for example
	>>>>>>>>
	>>>>>>>> %qstat
	>>>>>>>> critical error: can't resolve group
	>>>>>>>>
	>>>>>>>> Has anyone seen this before or have an idea why this
now
	>>>> shows up?
	>>>>>>>>
	>>>>>>>> Thanks
	>>>>>>>>
	>>>>>>>> Henk
	>>>>>>>>
	>>>>>>>>
	>>>>>>>>
	>>>>>>
	>>>>
	>>>
---------------------------------------------------------------------
	>>>>>>
	>>>>>>>> To unsubscribe, e-mail:
	>>>> users-unsubscribe at gridengine.sunsource.net
	>>>>>>>> For additional commands, e-mail:
	>>>>>>>>
	>>>>>> users-help at gridengine.sunsource.net
	>>>>>>
	>>>>>>>>
	>>>>>>
	>>>>
	>>>
---------------------------------------------------------------------
	>>>>>>
	>>>>>>>> To unsubscribe, e-mail:
	>>>> users-unsubscribe at gridengine.sunsource.net
	>>>>>>>> For additional commands, e-mail:
	>>>>>>>>
	>>>>>> users-help at gridengine.sunsource.net
	>>>>>>
	>>>>>>>>
	>>>>>>>>
	>>>>
-------------------------------------------------------------------
	>>>>>>>> -----
	>>>>>>>>
	>>>>>>>>
	>>>>
-------------------------------------------------------------------
	>>>>>>>> -- To unsubscribe, e-mail:
	>>>>>>>> users-unsubscribe at gridengine.sunsource.net
	>>>>>>>> For additional commands, e-mail:
	>>>>>>>> users-help at gridengine.sunsource.net
	>>>>
	>>>>
	>>>
---------------------------------------------------------------------
	>>>> To unsubscribe, e-mail:
users-unsubscribe at gridengine.sunsource.net
	>>>> For additional commands, e-mail:
users-help at gridengine.sunsource.net
	>>>>
	>>>>
	>>>
	>>>
**********************************************************************
	>>> This e-mail and any files transmitted with it are
	>>> confidential and intended solely for the use of the
	>>> individual or entity to which they are addressed. If you
have
	>>> received this e-mail in error please notify the system
	>>> manager at helpdesk at maerskoil.com.
	>>>
	>>> This e-mail and its contents do not constitute and shall not
	>>> be considered as a financial commitment of Maersk Olie og
Gas
	>>> AS and its affiliates.
	>>> Maersk Olie og Gas AS expressly disclaims any responsibility
	>>> as to the accuracy and use of this e-mail and its contents.
	>>>
**********************************************************************
	>>>
	>>>
---------------------------------------------------------------------
	>>> To unsubscribe, e-mail:
users-unsubscribe at gridengine.sunsource.net
	>>> For additional commands, e-mail:
users-help at gridengine.sunsource.net
	>>>
	>>>
	>>
	>>
---------------------------------------------------------------------
	>> To unsubscribe, e-mail:
users-unsubscribe at gridengine.sunsource.net
	>> For additional commands, e-mail:
users-help at gridengine.sunsource.net
	>>
	>>
	>>
**********************************************************************
	>> This e-mail and any files transmitted with it are
confidential and
	>> intended solely for the use of the individual or entity to
which they
	>> are addressed. If you have received this e-mail in error
please notify
	>> the system manager at helpdesk at maerskoil.com.
	>>
	>> This e-mail and its contents do not constitute and shall not
be
	>> considered as a financial commitment of Maersk Olie og Gas AS
	>> and its affiliates.
	>> Maersk Olie og Gas AS expressly disclaims any responsibility
	>> as to the accuracy and use of this e-mail and its contents.
	>>
**********************************************************************
	>>
	>>
---------------------------------------------------------------------
	>> To unsubscribe, e-mail:
users-unsubscribe at gridengine.sunsource.net
	>> For additional commands, e-mail:
users-help at gridengine.sunsource.net
	>>
	>>
	>>
**********************************************************************
	>> This e-mail and any files transmitted with it are
confidential and
	>> intended solely for the use of the individual or entity to
which they
	>> are addressed. If you have received this e-mail in error
please notify
	>> the system manager at helpdesk at maerskoil.com.
	>>
	>> This e-mail and its contents do not constitute and shall not
be
	>> considered as a financial commitment of Maersk Olie og Gas AS
	>> and its affiliates.
	>> Maersk Olie og Gas AS expressly disclaims any responsibility
	>> as to the accuracy and use of this e-mail and its contents.
	>>
**********************************************************************
	>>
	>>
---------------------------------------------------------------------
	>> To unsubscribe, e-mail:
users-unsubscribe at gridengine.sunsource.net
	>> For additional commands, e-mail:
users-help at gridengine.sunsource.net
	>>
	>>
	>
	> http://gridengine.info/
	>
	> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
D-85551
	> Kirchheim-Heimstetten
	> Amtsgericht Muenchen: HRB 161028
	> Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr.
Roland Boemer
	> Vorsitzender des Aufsichtsrates: Martin Haering
	>
	>
---------------------------------------------------------------------
	> To unsubscribe, e-mail:
users-unsubscribe at gridengine.sunsource.net
	> For additional commands, e-mail:
users-help at gridengine.sunsource.net
	>
	
	
---------------------------------------------------------------------
	To unsubscribe, e-mail:
users-unsubscribe at gridengine.sunsource.net
	For additional commands, e-mail:
users-help at gridengine.sunsource.net
	
	


________________________________

	This e-mail and any files transmitted with it are confidential
and 
	intended solely for the use of the individual or entity to which
they 
	are addressed. If you have received this e-mail in error please
notify 
	the system manager at helpdesk at maerskoil.com. 
	
	This e-mail and its contents do not constitute and shall not be 
	considered as a financial commitment of Maersk Olie og Gas AS 
	and its affiliates. 
	Maersk Olie og Gas AS expressly disclaims any responsibility 
	as to the accuracy and use of this e-mail and its contents. 
________________________________






More information about the gridengine-users mailing list