[GE users] Jobs getting rescheduled

amfortas n.gresham at manchester.ac.uk
Mon Aug 16 15:53:53 BST 2010


I keep running up against an extremely annoying intermittent issue whereby *all* running jobs in the Grid Engine queue suddenly get rescheduled for reasons that I do not understand.

The corresponding entries in the qmaster's 'reporting' log file are of the form e.g.

281969802:job_log:1281969802:restart:426731:0:NONE:r:execution daemon:node020.xxxxx.xxxxx:0:1024:1281968095:JOB.sh:xxxxx:xxxxx::XXXXXXX:sge:job didn't get resources -> schedule it again

Just to summarise the set-up:

- small (32 node, 256 core) Beowulf. 

- BerkeleyDB spooldb local on qmaster

- execd spools local on nodes

- executables on NFS share.

I's be very grateful for advice, or just an elucidation of the "job didn't get resources -> schedule it again" message.



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list