[GE users] sge_shepherd : free(): invalid pointer crash for more than 1032 slots

reuti reuti at staff.uni-marburg.de
Fri May 14 10:29:57 BST 2010


Am 13.05.2010 um 16:15 schrieb henk:

> On our system the gridengine 6.2u5 shepherd crashes for a simple
> parallel job with 1040 slots. It is fine for 1032 slots (each server has
> 8 cores and I increment the job size by adding a server). I attach the
> error file with a memory map. MPI is OpenMPI 1.4.1 and OS is SLES 11.0
> AS a test I kept the 1032 slots on fixed servers and varied the server
> that supplied the additional 8 slots, all giving this problem.
> Is there some magcical number beyond 1032 that causes a problem for the
> shepherd exe?

I don't know for sure, but 1032 sounds like 8 on the master host of the parallel job, plus 1024 slaves - and 1024 is a usual taken value for a limit somewhere. But it should generate an error then and not crash.

-- Reuti

> Thanks
> Henk
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257181
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].<shepherd_crash_1040slots.e1284.txt>


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list