[GE users] SGE on latest Mac OS X Server 10.5.4 - help with non-root users

Ian Levesque ian at crystal.harvard.edu
Wed Jul 16 18:38:51 BST 2008


Hi Chris,

I've finally found the time to test the launchd scripts, and it does  
seem that the problems are resolved. Many thanks for the solution! I'm  
currently using the scripts with the 6.2b2 binaries on a cluster  
currently running OS X Server 10.5.2.

Cheers,
Ian


On Jul 8, 2008, at 7:20 PM, Chris Dagdigian wrote:

>
> We have found a workaround for this issue. No root cause yet.
>
> Issue symptoms on Mac OS X Server 10.5.4:
>
> Error #1: "can't get password entry for user "<username>". Either  
> the user does not exist or NIS error!"
> Error #2:  "admin_user "<username>" does not exist"
>
> Hypothesis: An update to 10.5.4 server has rendered both the  
> SystemStarter framework and the actual 'sgemaster','sgeexecd'  
> scripts unreliable for some reason.
>
> Workaround Summary: Moving all SGE daemon start/stop procedures  
> under the new launchd framework seems to resolve the issue
>
> Tested on: Official SGE 6.1u4 x86 binaries for OS X
>
> Scope: This problem so far has only appeared on 10.5.4 Server  
> versions of OS X, the 'client' version of 10.5.4 seems fine
>
> Workaround: Integrate Grid Engine start/stop procedures via the new  
> OS X 'launchd' framework.
>
> Details:
>
> The launchd scripts I used are published here:
> http://blog.bioteam.net/2008/03/04/apple-os-x-105-launchd-scripts-for-grid-engine/
>
> I've also updated the launchd page on the SGE wiki:
>
> http://wiki.gridengine.info/wiki/index.php/GridEngine_launchd
>
>
>
> ###
> Credit: Bill Van Etten from BioTeam was the first person to  
> reproduce the problem and then fix it by using launchd. I had spend  
> days messing with dozens of different SGE and OS configurations and  
> never would have though of trying out the new daemon/service  
> framework in OS X. Bill, of course, is also the original author of  
> the launchd integration scripts published on the BioTeam blog back  
> in May 2008. I need to publicly thank Bill for saving my sanity  
> especially as this problem was only reproducible intermittently.
> ###
>
>
>
>
>
>
>
> On Jul 8, 2008, at 12:55 PM, Ian Levesque wrote:
>
>> Hi Chris,
>>
>> On Tue, 8 Jul 2008 10:30:04 -0400, Chris Dagdigian <dag at sonsorol.org>
>> wrote:
>>
>>> The good news is that people far smarter than me are taking a look  
>>> at
>>> it and I've made my server system accessible to a few people who are
>>> looking into things.
>>
>> That is good news. Thanks for your detailed notes and contribution.
>>
>>
>>> The bad news is I may have to migrate a new client cluster to  
>>> Platform
>>> LSF as not being able to get SGE to run for more than a week is  
>>> pretty
>>> embarrassing.
>>
>> I've got a site whose cluster is idle now because of this, as well.  
>> And
>> they're getting a bit antsy. I didn't participate in the purchase  
>> decision
>> but of course I'm the admin so I look bad when their users can't  
>> schedule
>> jobs. :)
>>
>> I'm hoping a patch becomes available soon, so I don't have to  
>> deviate from
>> our standard SGE configuration...
>>
>> Cheers,
>> Ian
>> -- 
>> * * * *
>> Ian Levesque
>> Research Systems Architect
>> Harvard Medical School
>> Structural Biology Grid
>> http://www.sbgrid.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list