[GE users] RE: cannot submit job because of error bug?

Dmitry Zhukovski DZH at maerskoil.com
Mon Sep 3 08:41:16 BST 2007


Hi Henk,
 
  I assume it wont work with ldap users?
 
br,
dmitry

________________________________

From: SLIM H.A. [mailto:h.a.slim at durham.ac.uk] 
Sent: 01 September 2007 01:50
To: users at gridengine.sunsource.net
Subject: RE: [GE users] RE: cannot submit job because of error bug?


Our system runs suse 10, kernel 2.6.18. I also found the 1024 character
limit problem. 
 
However we may have found a workaround, adding the offending primary
group to /etc/group but with the user list split over two or more lines
by repeating the three initial fields, like
 
name:!:gid:additional_users
name:!:gid:more_additional_users
 
According to A Frisch, Essential System Administration, 3ed, p230 this
is allowed although the grpck command complains about duplicate entries.
For that reason we should get a better solution than this as there may
be side effects. 
 
Henk
 
________________________________

From: Xinghong He [mailto:hexinghong at gmail.com]
Sent: Fri 8/31/2007 9:21 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] RE: cannot submit job because of error bug?



It looks like a problem of the new version with Red Hat. All my RH
systems
have the same problem (RH on Intel and RH on AMD) and all SUSE work
fine.
Xinghong

----- Original Message -----
From: <Andreas.Haas at Sun.COM>
To: <users at gridengine.sunsource.net>
Sent: Friday, August 31, 2007 9:35 AM
Subject: RE: [GE users] RE: cannot submit job because of error bug?


> Hi Dmitry,
>
> On Fri, 31 Aug 2007, Dmitry Zhukovski wrote:
>
>> Hi all,
>>
>>  I moved a bit further. I found issue #2249 regarding too long group
>> entries. The fix contains replacing of hard coded constant by system
>> call sysconf(SC_GETGR_R_SIZE_MAX). It's fine but in my case it gives
>> 1024 and that's too less than required for my groups to be resolved.
>> Whenever I increase this value(and allocated buffer) by let say 10
times
>> - qstat works!
>
> What a mess! sysconf(SC_GETGR_R_SIZE_MAX) had better returned -1
instead
> of 1024, then our 20k buffer size had been in effect:
>
>    int get_group_buffer_size(void)
>    {
>       enum { buf_size = 20480 };  /* default is 20 KB */
>
>       int sz = buf_size;
>
>    #ifdef _SC_GETGR_R_SIZE_MAX
>       if ((sz = (int)sysconf(_SC_GETGR_R_SIZE_MAX)) == -1) {
>          sz = buf_size;
>       }
>    #endif
>
>       return sz;
>    }
>
> according POSIX sysconf(SC_GETGR_R_SIZE_MAX) is supposed to return the
> "maximum size needed for this buffer"!
>
>>
>>  Is it possible to increase that system value by sysctl or anything
>> else? Or I have to compile my own version SGE?
>
> My expectation is no manual tuning were required for this, but maybe
> I'm wrong.
>
> Actually what OS is this? Wasn't it Red Hat?
>
> Regards,
> Andreas
>
>>
>> Br,
>> dmitry
>>
>> -----Original Message-----
>> From: Dmitry Zhukovski [mailto:DZH at maerskoil.com]
>> Sent: 30. august 2007 13:22
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] RE: cannot submit job because of error bug?
>>
>> Hi all,
>>
>>  Henk gave me good idea - to find limitation for number of users per
>> group. An hour of adding and removing of users from test group gave
me
>> number 120! qstat and qdel doesn't complain anymore about not
resolved
>> group.
>>
>>  If there are any developers here - why is such limitation and is it
>> possible to increase it? One of my primary groups contains more than
200
>> users.
>>
>> Br,
>> dmitry
>>
>> -----Original Message-----
>> From: SLIM H.A. [mailto:h.a.slim at durham.ac.uk]
>> Sent: 29. august 2007 17:14
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] RE: cannot submit job because of error bug?
>>
>>
>> In my case there are actually two users (who both have a primary
group)
>> with this problem.
>>
>> Indeed the first user is in a secondary group that also contains
users
>> without primary group which our systems people should fix.
>>
>> The second user's primary group is also a secondary group but all
users
>> listed for that secondary group can be identified with the id command
>> and all have a primary group.
>>
>> So I don't thing users without primary group listed in a secondary
group
>> is necessarily the problem.
>> Also this problem was not in version 6.0u7 from which I upgraded to
6.1.
>>
>>
>> Of course I don't want to ask the standard question "has anything
been
>> changed?" that every sysadmin is badgered with when something is
>> suddenly not working anymore but if I shorten the list of users in
the
>> secondary group, the commands do work again.
>>
>> I compared the output from qstat v6.1 with that of v6.0u7 for level
10.
>> There are 3 lines from qstat v6.1 that signal an error:
>>    63  17496 47463950073600 --> sge_log() {
>>    64  17496 47463950073600     sge_log: ctx is NULL
>>    65  17496 47463950073600     ../libs/sgeobj/sge_answer.c 937 can't
>> resolve group
>>
>> whereas v6.07 finishes with
>>
>> error: can't unpack gdi request
>> error: error unpacking gdi request: bad argument
>> failed receiving gdi request
>>
>> and with debug level 10 it prints the uid and gid of the user.
>>
>> Best wishes
>>
>> Henk
>>
>>> -----Original Message-----
>>> From: Dmitry Zhukovski [mailto:DZH at maerskoil.com]
>>> Sent: 29 August 2007 14:46
>>> To: users at gridengine.sunsource.net
>>> Subject: RE: [GE users] RE: cannot submit job because of error bug?
>>>
>>> Hi all,
>>>
>>>   I have exactly same output for one of my users - qstat,
>>> qdel, qsub and other gives 'can't resolve group'.
>>>
>>>   A little bit of googling gave me next issue
>>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=1256 .
>>> I checked and found user primary group's ID was not listed in
>>> ldap set of groups.
>>> So search on that user gave me list of all slave groups he
>>> belongs to but not primary.
>>>
>>>   I added primary group but still get 'can't resolve group' message.
>>> Question - can it be cached somewhere?
>>>
>>> Br,
>>> dmitry
>>>
>>> -----Original Message-----
>>> From: SLIM H.A. [mailto:h.a.slim at durham.ac.uk]
>>> Sent: 29. august 2007 11:40
>>> To: users at gridengine.sunsource.net
>>> Subject: RE: [GE users] RE: cannot submit job because of error bug?
>>>
>>> Dear Daniel
>>>
>>> I have set dl 4 and attach the output. I had a look at the
>>> source, is it possible to build qstat by itself for debug purpose?
>>>
>>> Thanks
>>>
>>> Henk
>>>
>>>> -----Original Message-----
>>>> From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM]
>>>> Sent: 28 August 2007 19:07
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] RE: cannot submit job because of error bug?
>>>>
>>>> The debug levels aren't monotonic.  10 is actually less information
>>>> than some lower levels.  4 might give you more info.  See:
>>>>
>>>> http://blogs.sun.com/templedf/entry/using_debugging_output
>>>>
>>>> Daniel
>>>>
>>>> SLIM H.A. wrote:
>>>>> Further information to the failure of the sge commands for
>>>> some unix
>>>>> groups of users.
>>>>>
>>>>> Setting the debug level to 10 and running the qstat command
>>>> gives for
>>>>> the last few lines of stdout:
>>>>>
>>>>>     63  15359 47241863851776 --> sge_log() {
>>>>>     64  15359 47241863851776     sge_log: ctx is NULL
>>>>>     65  15359 47241863851776
>>>> ../libs/sgeobj/sge_answer.c 937 can't
>>>>> resolve group
>>>>>
>>>>> I attached the full debug output.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Henk
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: SLIM H.A.
>>>>>> Sent: 28 August 2007 16:46
>>>>>> To: SLIM H.A.
>>>>>> Subject: cannot submit job because of error bug?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Some users are unable to submit jobs under sge 6.1. The
>>>> error message
>>>>>> is this:
>>>>>>
>>>>>> % qsub
>>>>>> Unable to initialize environment because of error: can't resolve
>>>>>> group
>>>>>>
>>>>>
>>>>>
>>>>>> Exiting.
>>>>>>
>>>>>>
>>>>>> It appears that a limit is hit by the grid engine commands when
>>>>>> reading one of the secondary group entries in the
>>>> /etc/group file. It
>>>>>> seems the commands cannot process lines that have more than some
>>>>>> small
>>>>>>
>>>>>
>>>>>
>>>>>> number of charcters, probably 512.
>>>>>> Any userid that has that particular offending secondary
>>>> group as its
>>>>>> primary group cannot submit jobs.
>>>>>>
>>>>>> When the number of userids for the offending secondary group is
>>>>>> reduced, the userid is able to submit again.
>>>>>>
>>>>>> Is this a bug as 6.0u7 did not have this problem?
>>>>>>
>>>>>> Thanks for any advice
>>>>>>
>>>>>>
>>>>>> Henk
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: SLIM H.A.
>>>>>>> Sent: 28 August 2007 11:34
>>>>>>> To: 'users at gridengine.sunsource.net'
>>>>>>> Subject: RE: [GE users] 6.1: critical error: can't resolve group
>>>>>>>
>>>>>>> Chris,
>>>>>>>
>>>>>>> I tried this, it seems to be ok:
>>>>>>>
>>>>>>> # grpck
>>>>>>> Checking `/etc/group'
>>>>>>>
>>>>>>> is the only response I get
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Henk
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: chris.harwell at novartis.com
>>>>>>>>
>>>>>> [mailto:chris.harwell at novartis.com]
>>>>>>
>>>>>>>> Sent: 28 August 2007 11:03
>>>>>>>> To: users
>>>>>>>> Subject: Re: [GE users] 6.1: critical error: can't
>>> resolve group
>>>>>>>>
>>>>>>>> Try running grpck as root.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>> From: "SLIM H.A." [h.a.slim at durham.ac.uk]
>>>>>>>> Sent: 08/28/2007 04:56 AM
>>>>>>>> To: <users at gridengine.sunsource.net>
>>>>>>>> Subject: [GE users] 6.1: critical error: can't resolve group
>>>>>>>>
>>>>>>>>
>>>>>>>> I just upgraded from 6.0u7 to 6.1 and have come across a
>>>>>>>>
>>>>>>> problem. The
>>>>>>>
>>>>>>>> Grid Engine commands now give for some users an error,
>>>> for example
>>>>>>>>
>>>>>>>> %qstat
>>>>>>>> critical error: can't resolve group
>>>>>>>>
>>>>>>>> Has anyone seen this before or have an idea why this now
>>>> shows up?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Henk
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
---------------------------------------------------------------------
>>>>>>
>>>>>>>> To unsubscribe, e-mail:
>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>> For additional commands, e-mail:
>>>>>>>>
>>>>>> users-help at gridengine.sunsource.net
>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
---------------------------------------------------------------------
>>>>>>
>>>>>>>> To unsubscribe, e-mail:
>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>> For additional commands, e-mail:
>>>>>>>>
>>>>>> users-help at gridengine.sunsource.net
>>>>>>
>>>>>>>>
>>>>>>>>
>>>> -------------------------------------------------------------------
>>>>>>>> -----
>>>>>>>>
>>>>>>>>
>>>> -------------------------------------------------------------------
>>>>>>>> -- To unsubscribe, e-mail:
>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>> For additional commands, e-mail:
>>>>>>>> users-help at gridengine.sunsource.net
>>>>
>>>>
>>>
---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail:
users-help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>>
**********************************************************************
>>> This e-mail and any files transmitted with it are
>>> confidential and intended solely for the use of the
>>> individual or entity to which they are addressed. If you have
>>> received this e-mail in error please notify the system
>>> manager at helpdesk at maerskoil.com.
>>>
>>> This e-mail and its contents do not constitute and shall not
>>> be considered as a financial commitment of Maersk Olie og Gas
>>> AS and its affiliates.
>>> Maersk Olie og Gas AS expressly disclaims any responsibility
>>> as to the accuracy and use of this e-mail and its contents.
>>>
**********************************************************************
>>>
>>>
---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
**********************************************************************
>> This e-mail and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to which they
>> are addressed. If you have received this e-mail in error please
notify
>> the system manager at helpdesk at maerskoil.com.
>>
>> This e-mail and its contents do not constitute and shall not be
>> considered as a financial commitment of Maersk Olie og Gas AS
>> and its affiliates.
>> Maersk Olie og Gas AS expressly disclaims any responsibility
>> as to the accuracy and use of this e-mail and its contents.
>>
**********************************************************************
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
**********************************************************************
>> This e-mail and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to which they
>> are addressed. If you have received this e-mail in error please
notify
>> the system manager at helpdesk at maerskoil.com.
>>
>> This e-mail and its contents do not constitute and shall not be
>> considered as a financial commitment of Maersk Olie og Gas AS
>> and its affiliates.
>> Maersk Olie og Gas AS expressly disclaims any responsibility
>> as to the accuracy and use of this e-mail and its contents.
>>
**********************************************************************
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> http://gridengine.info/
>
> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
> Kirchheim-Heimstetten
> Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland
Boemer
> Vorsitzender des Aufsichtsrates: Martin Haering
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




**********************************************************************
This e-mail and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to which they 
are addressed. If you have received this e-mail in error please notify 
the system manager at helpdesk at maerskoil.com.

This e-mail and its contents do not constitute and shall not be 
considered as a financial commitment of Maersk Olie og Gas AS 
and its affiliates. 
Maersk Olie og Gas AS expressly disclaims any responsibility
as to the accuracy and use of this e-mail and its contents.
**********************************************************************




More information about the gridengine-users mailing list