[GE users] SGE on latest Mac OS X Server 10.5.4 - help with non-root users

Chris Dagdigian dag at sonsorol.org
Wed Jul 9 00:20:48 BST 2008


We have found a workaround for this issue. No root cause yet.

Issue symptoms on Mac OS X Server 10.5.4:

  Error #1: "can't get password entry for user "<username>". Either  
the user does not exist or NIS error!"
  Error #2:  "admin_user "<username>" does not exist"

Hypothesis: An update to 10.5.4 server has rendered both the  
SystemStarter framework and the actual 'sgemaster','sgeexecd' scripts  
unreliable for some reason.

Workaround Summary: Moving all SGE daemon start/stop procedures under  
the new launchd framework seems to resolve the issue

Tested on: Official SGE 6.1u4 x86 binaries for OS X

Scope: This problem so far has only appeared on 10.5.4 Server versions  
of OS X, the 'client' version of 10.5.4 seems fine

Workaround: Integrate Grid Engine start/stop procedures via the new OS  
X 'launchd' framework.

Details:

The launchd scripts I used are published here:
http://blog.bioteam.net/2008/03/04/apple-os-x-105-launchd-scripts-for-grid-engine/

I've also updated the launchd page on the SGE wiki:

http://wiki.gridengine.info/wiki/index.php/GridEngine_launchd



###
Credit: Bill Van Etten from BioTeam was the first person to reproduce  
the problem and then fix it by using launchd. I had spend days messing  
with dozens of different SGE and OS configurations and never would  
have though of trying out the new daemon/service framework in OS X.  
Bill, of course, is also the original author of the launchd  
integration scripts published on the BioTeam blog back in May 2008. I  
need to publicly thank Bill for saving my sanity especially as this  
problem was only reproducible intermittently.
###







On Jul 8, 2008, at 12:55 PM, Ian Levesque wrote:

> Hi Chris,
>
> On Tue, 8 Jul 2008 10:30:04 -0400, Chris Dagdigian <dag at sonsorol.org>
> wrote:
>
>> The good news is that people far smarter than me are taking a look at
>> it and I've made my server system accessible to a few people who are
>> looking into things.
>
> That is good news. Thanks for your detailed notes and contribution.
>
>
>> The bad news is I may have to migrate a new client cluster to  
>> Platform
>> LSF as not being able to get SGE to run for more than a week is  
>> pretty
>> embarrassing.
>
> I've got a site whose cluster is idle now because of this, as well.  
> And
> they're getting a bit antsy. I didn't participate in the purchase  
> decision
> but of course I'm the admin so I look bad when their users can't  
> schedule
> jobs. :)
>
> I'm hoping a patch becomes available soon, so I don't have to  
> deviate from
> our standard SGE configuration...
>
> Cheers,
> Ian
> -- 
> * * * *
> Ian Levesque
> Research Systems Architect
> Harvard Medical School
> Structural Biology Grid
> http://www.sbgrid.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list