[GE issues] [Issue 3194] sge_shepherd segfault on OpenSuSE 11.2 (x86_64)

megware stephan.ebelt at megware.com
Fri Jan 22 14:21:32 GMT 2010


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3194






------- Additional comments from megware at sunsource.net Fri Jan 22 06:21:31 -0800 2010 -------
nope. Gives nothing but the usual:

01/22/2010 15:13:21|worker|frontend1|W|job 13.1 failed on host node02.service assumedly after job because: job 13.1 died through signal ABRT (6)

running it directly works:

$ ssh node02.service strace sleep 10
execve("/bin/sleep", ["sleep", "10"], [/* 50 vars */]) = 0
brk(0)                                  = 0x606000
[...]
clock_gettime(CLOCK_MONOTONIC, {2781295, 74039388}) = 0
nanosleep({10, 0}, NULL)                = 0
clock_gettime(CLOCK_MONOTONIC, {2781305, 74160613}) = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)

when watching ps output on the node during a job I never see a 'sleep' process. There is only sge_shepherd in 'Defunct' state visible (I
might be too slow and not catch it in the right moment however)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=240382

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list