[GE issues] [Issue 3257] execd 'job exceeds job hard limit' message should include task id as well as job id.

ccaamad m.c.dixon at leeds.ac.uk
Wed Mar 31 11:58:15 BST 2010


------- Additional comments from ccaamad at sunsource.net Wed Mar 31 03:58:14 -0700 2010 -------
That's right. h_rt/s_rt messages include the task id and are defined by lines 219/220 of msg_execd.h:

#define MSG_EXECD_EXCEEDHWALLCLOCK_UU _MESSAGE(29128, _("job "sge_U32CFormat"."sge_U32CFormat" exceeded hard wallclock time - initiate
terminate method"))
#define MSG_EXECD_EXCEEDSWALLCLOCK_UU _MESSAGE(29129, _("job "sge_U32CFormat"."sge_U32CFormat" exceeded soft wallclock time - initiate soft
notify method"))

And used by lines 455 and 474 of execd_ck_to_do.c.

We just need the 'exceeds job (hard|soft) limit' message to include the task id information as well. I'd include a simple patch but I'm not
yet geared-up to rebuilding grid engine and I don't want to offer something that isn't tested.

This would really be a big help - some of our users submit task arrays where >95% of tasks need <1G of memory and <5% need >4G. It aids
throughput to ask them to request 1G for the job and then resubmit those tasks that fail. Changing the message would aid identification of
what tasks have failed and why.




To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list