[GE users] RE: cannot submit job because of error bug?

SLIM H.A. h.a.slim at durham.ac.uk
Wed Aug 29 16:14:27 BST 2007


 
In my case there are actually two users (who both have a primary group)
with this problem.
 
Indeed the first user is in a secondary group that also contains users
without primary group which our systems people should fix.

The second user's primary group is also a secondary group but all users
listed for that secondary group can be identified with the id command
and all have a primary group.

So I don't thing users without primary group listed in a secondary group
is necessarily the problem.
Also this problem was not in version 6.0u7 from which I upgraded to 6.1.


Of course I don't want to ask the standard question "has anything been
changed?" that every sysadmin is badgered with when something is
suddenly not working anymore but if I shorten the list of users in the
secondary group, the commands do work again.

I compared the output from qstat v6.1 with that of v6.0u7 for level 10.
There are 3 lines from qstat v6.1 that signal an error:
    63  17496 47463950073600 --> sge_log() {
    64  17496 47463950073600     sge_log: ctx is NULL
    65  17496 47463950073600     ../libs/sgeobj/sge_answer.c 937 can't
resolve group

whereas v6.07 finishes with

error: can't unpack gdi request
error: error unpacking gdi request: bad argument
failed receiving gdi request

and with debug level 10 it prints the uid and gid of the user.

Best wishes

Henk

> -----Original Message-----
> From: Dmitry Zhukovski [mailto:DZH at maerskoil.com] 
> Sent: 29 August 2007 14:46
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] RE: cannot submit job because of error bug?
> 
> Hi all,
> 
>   I have exactly same output for one of my users - qstat, 
> qdel, qsub and other gives 'can't resolve group'.
> 
>   A little bit of googling gave me next issue
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=1256 . 
> I checked and found user primary group's ID was not listed in 
> ldap set of groups.
> So search on that user gave me list of all slave groups he 
> belongs to but not primary. 
> 
>   I added primary group but still get 'can't resolve group' message.
> Question - can it be cached somewhere?
> 
> Br,
> dmitry
> 
> -----Original Message-----
> From: SLIM H.A. [mailto:h.a.slim at durham.ac.uk]
> Sent: 29. august 2007 11:40
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] RE: cannot submit job because of error bug?
> 
> Dear Daniel
> 
> I have set dl 4 and attach the output. I had a look at the 
> source, is it possible to build qstat by itself for debug purpose?
> 
> Thanks
> 
> Henk
> 
> > -----Original Message-----
> > From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM]
> > Sent: 28 August 2007 19:07
> > To: users at gridengine.sunsource.net
> > Subject: Re: [GE users] RE: cannot submit job because of error bug?
> > 
> > The debug levels aren't monotonic.  10 is actually less information 
> > than some lower levels.  4 might give you more info.  See:
> > 
> > http://blogs.sun.com/templedf/entry/using_debugging_output
> > 
> > Daniel
> > 
> > SLIM H.A. wrote:
> > > Further information to the failure of the sge commands for
> > some unix
> > > groups of users.
> > >  
> > > Setting the debug level to 10 and running the qstat command
> > gives for
> > > the last few lines of stdout:
> > >
> > >     63  15359 47241863851776 		--> sge_log() {
> > >     64  15359 47241863851776     sge_log: ctx is NULL
> > >     65  15359 47241863851776     
> > ../libs/sgeobj/sge_answer.c 937 can't
> > > resolve group
> > >
> > > I attached the full debug output.
> > >
> > > Thanks
> > >
> > > Henk
> > >
> > >   
> > >> -----Original Message-----
> > >> From: SLIM H.A. 
> > >> Sent: 28 August 2007 16:46
> > >> To: SLIM H.A.
> > >> Subject: cannot submit job because of error bug?
> > >>
> > >>  
> > >>
> > >> Some users are unable to submit jobs under sge 6.1. The
> > error message
> > >> is this:
> > >>
> > >> % qsub
> > >> Unable to initialize environment because of error: can't resolve 
> > >> group
> > >>     
> > >
> > >   
> > >> Exiting.
> > >>
> > >>
> > >> It appears that a limit is hit by the grid engine commands when 
> > >> reading one of the secondary group entries in the
> > /etc/group file. It
> > >> seems the commands cannot process lines that have more than some 
> > >> small
> > >>     
> > >
> > >   
> > >> number of charcters, probably 512.
> > >> Any userid that has that particular offending secondary
> > group as its
> > >> primary group cannot submit jobs.
> > >>
> > >> When the number of userids for the offending secondary group is 
> > >> reduced, the userid is able to submit again.
> > >>
> > >> Is this a bug as 6.0u7 did not have this problem?
> > >>
> > >> Thanks for any advice
> > >>
> > >>
> > >> Henk
> > >>
> > >>     
> > >>> -----Original Message-----
> > >>> From: SLIM H.A. 
> > >>> Sent: 28 August 2007 11:34
> > >>> To: 'users at gridengine.sunsource.net'
> > >>> Subject: RE: [GE users] 6.1: critical error: can't resolve group
> > >>>
> > >>> Chris,
> > >>>
> > >>> I tried this, it seems to be ok:
> > >>>
> > >>> # grpck
> > >>> Checking `/etc/group' 
> > >>>
> > >>> is the only response I get
> > >>>
> > >>> Thanks
> > >>>
> > >>> Henk
> > >>>
> > >>>
> > >>>
> > >>>       
> > >>>> -----Original Message-----
> > >>>> From: chris.harwell at novartis.com
> > >>>>         
> > >> [mailto:chris.harwell at novartis.com]
> > >>     
> > >>>> Sent: 28 August 2007 11:03
> > >>>> To: users
> > >>>> Subject: Re: [GE users] 6.1: critical error: can't 
> resolve group
> > >>>>
> > >>>> Try running grpck as root. 
> > >>>>
> > >>>>
> > >>>>
> > >>>> ----- Original Message -----
> > >>>> From: "SLIM H.A." [h.a.slim at durham.ac.uk]
> > >>>> Sent: 08/28/2007 04:56 AM
> > >>>> To: <users at gridengine.sunsource.net>
> > >>>> Subject: [GE users] 6.1: critical error: can't resolve group
> > >>>>
> > >>>>
> > >>>> I just upgraded from 6.0u7 to 6.1 and have come across a
> > >>>>         
> > >>> problem. The
> > >>>       
> > >>>> Grid Engine commands now give for some users an error,
> > for example
> > >>>>
> > >>>> %qstat
> > >>>> critical error: can't resolve group
> > >>>>
> > >>>> Has anyone seen this before or have an idea why this now
> > shows up?
> > >>>>
> > >>>> Thanks
> > >>>>
> > >>>> Henk
> > >>>>
> > >>>>
> > >>>>         
> > >> 
> > 
> ---------------------------------------------------------------------
> > >>     
> > >>>> To unsubscribe, e-mail: 
> > users-unsubscribe at gridengine.sunsource.net
> > >>>> For additional commands, e-mail: 
> > >>>>         
> > >> users-help at gridengine.sunsource.net
> > >>     
> > >>>>         
> > >> 
> > 
> ---------------------------------------------------------------------
> > >>     
> > >>>> To unsubscribe, e-mail: 
> > users-unsubscribe at gridengine.sunsource.net
> > >>>> For additional commands, e-mail: 
> > >>>>         
> > >> users-help at gridengine.sunsource.net
> > >>     
> > >>>>         
> > >>>> 
> > -------------------------------------------------------------------
> > >>>> -----
> > >>>>
> > >>>> 
> > -------------------------------------------------------------------
> > >>>> -- To unsubscribe, e-mail: 
> > >>>> users-unsubscribe at gridengine.sunsource.net
> > >>>> For additional commands, e-mail: 
> > >>>> users-help at gridengine.sunsource.net
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > 
> > 
> 
> **********************************************************************
> This e-mail and any files transmitted with it are 
> confidential and intended solely for the use of the 
> individual or entity to which they are addressed. If you have 
> received this e-mail in error please notify the system 
> manager at helpdesk at maerskoil.com.
> 
> This e-mail and its contents do not constitute and shall not 
> be considered as a financial commitment of Maersk Olie og Gas 
> AS and its affiliates. 
> Maersk Olie og Gas AS expressly disclaims any responsibility 
> as to the accuracy and use of this e-mail and its contents.
> **********************************************************************
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list