[GE users] Jobs getting rescheduled

reuti reuti at staff.uni-marburg.de
Mon Aug 16 16:33:19 BST 2010


Hi,

Am 16.08.2010 um 16:53 schrieb amfortas:

> I keep running up against an extremely annoying intermittent issue whereby *all* running jobs in the Grid Engine queue suddenly get rescheduled for reasons that I do not understand.

jobs were submit with "-r y" and/or the queue has the flag "rerun TRUE" set?

Was there any entry in the messages file of the qmaster (while "loglevel log_info" is set)?

Someone issued `qmod -rj "*"` by accident?

-- Reuti


> The corresponding entries in the qmaster's 'reporting' log file are of the form e.g.
> 
> 281969802:job_log:1281969802:restart:426731:0:NONE:r:execution daemon:node020.xxxxx.xxxxx:0:1024:1281968095:JOB.sh:xxxxx:xxxxx::XXXXXXX:sge:job didn't get resources -> schedule it again
> 
> Just to summarise the set-up:
> 
> - small (32 node, 256 core) Beowulf. 
> 
> - BerkeleyDB spooldb local on qmaster
> 
> - execd spools local on nodes
> 
> - executables on NFS share.
> 
> I's be very grateful for advice, or just an elucidation of the "job didn't get resources -> schedule it again" message.
> 
> Regards
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274732
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274746

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list