[GE users] sge_shepherd segfaults

kisielk kamil at zymeworks.com
Thu Jan 14 17:22:18 GMT 2010


> Am 14.01.2010 um 01:03 schrieb kisielk:
> 
> > I'm still trying to get SGE running on OpenSUSE 11.1. I'm using the  
> > CVS version to work around the LDAP user problems I was having  
> > earlier.
> >
> > I now have job submission working for all my users, however  
> > whenever a job begins to run it immediately causes sge_shepherd to  
> > segfault.
> >
> > From dmesg:
> > sge_shepherd[21485]: segfault at 7f7878000000 ip 00007f787a630bc7  
> > sp 00007fff833bc980 error 4 in libc-2.9.so[7f787a5bb000+14f000]
> >
> > From the qmaster messages file:
> > 01/13/2010 15:42:19|worker|demo|W|job 7.1 failed on host  
> > demo.lan.zymeworks.com assumedly after job because: job 7.1 died  
> > through signal SEGV (11)
> >
> >
> > I tried compiling with -no-opt and -debug but was still unable to  
> > get any more information than what's above.
> >
> > How can I go about debugging this problem?
> 
> Is there any file in /tmp with startup errors of the qmaster? This  
> location is used by SGE as a last resort.
> 
> -- Reuti
> 

No, nothing in /tmp. qmaster and execd seem to run fine and appear to start running the job, which then immediately fails because of sge_shepherd.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=238793

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list