[GE users] 6.2u5: "failed to deliver signal 20 to job"

reuti reuti at staff.uni-marburg.de
Mon Feb 8 22:32:00 GMT 2010


Hi,

Am 08.02.2010 um 17:42 schrieb ccaamad:

> I was wondering if anyone else had seen this. I've been doing some  
> testing
> trying to get a new parallel application running with my 6.2u5  
> cluster. It
> seems that my execd's have got a bit confused and endlessly keep
> printing-out things like:
>
> 02/08/2010 16:36:35|  main|c1s0b8n0|W|job 6018.1 exceeded hard  
> wallclock time - initiate terminate method
> 02/08/2010 16:36:35|  main|c1s0b8n0|W|failed to deliver signal 20  
> to job 6018.1 task 1.c1s0b8n0 for KILL (shepherd with pid 420): No  
> such file or

you redefined the warning signals for a kill to be sigtstp?

qstat is still listing the job? Sometimes there are some files left  
in the subdirectory of the spool directory of the node reading .../ 
jobs/00/0000/6018.1 which must be removed by hand to get rid of these  
messages.


> directory
>
> The job and shepherd have already finished, but the execd seems to  
> have
> trouble forgetting about them - it keeps printing the message every  
> couple
> of minutes.
>
> I seem to have triggered this problem quite a bit: at one point the  
> execd
> refused to start a new job because it had run out of group ids to  
> use -
> until I restarted the daemon.

Which range did you define for the additonal group ids?

-- Reuti


>
> Any ideas?
>
> Mark
> -- 
> -----------------------------------------------------------------
> Mark Dixon                       Email    : m.c.dixon at leeds.ac.uk
> HPC/Grid Systems Support         Tel (int): 35429
> Information Systems Services     Tel (ext): +44(0)113 343 5429
> University of Leeds, LS2 9JT, UK
> -----------------------------------------------------------------
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=243959
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=243995

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list