[GE users] wait3 returned -1 / now sending signal CONT to pid -3600

Roland Dittel Roland.Dittel at Sun.COM
Thu Dec 1 08:02:43 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Sebastian,

do you have any entries in your execd message file. It seems you are 
running in IZ 1665 which is fixed in the coming 6.0u7 release (It's also 
fixed in the snapshot1 release).

Roland


Sebastian Stark wrote:
> I'm getting lots of these errors. What could be the cause? It only happens with some jobs.
> 
> Thanks for any hints.
> 
> 
> -Sebastian
> 
> -------------------------------------------------------------------------------
> To: stark at tuebingen.mpg.de
> Subject: SGE 6.0u4: Job 649640 failed
> From: <>
> 
> Job 649640 caused action: none
>  User        = schweike
>  Queue       = all.q at node112
>  Host        = node112
>  Start Time  = <unknown>
>  End Time    = <unknown>
> failed before writing exit_status:shepherd exited with exit status 19
> Shepherd trace:
> 11/26/2005 15:10:29 [4399:3599]: shepherd called with uid = 0, euid = 4399
> 11/26/2005 15:10:30 [4399:3599]: setpgid(3599, 3599) returned 0
> 11/26/2005 15:10:30 [4399:3599]: no prolog script to start
> 11/26/2005 15:10:30 [4399:3600]: pid=3600 pgrp=3600 sid=3600 old pgrp=3599 getlogin()=<no login set>
> 11/26/2005 15:10:30 [4399:3599]: forked "job" with pid 3600
> 11/26/2005 15:10:30 [4399:3599]: child: job - pid: 3600
> 11/26/2005 15:10:30 [4399:3600]: setosjobid: uid = 0, euid = 4399
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_CPU setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard
> +18446744073709551615)
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_FSIZE setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard
> +18446744073709551615)
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_DATA setting: (soft 6291456000 hard 6291456000) resulting: (soft 6291456000 hard 6291456000)
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_STACK setting: (soft 8388608 hard 8388608) resulting: (soft 8388608 hard 8388608)
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_CORE setting: (soft 0 hard 0) resulting: (soft 0 hard 0)
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_VMEM/RLIMIT_AS setting: (soft 6291456000 hard 6291456000) resulting: (soft 6291456000 hard 6291456000)
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_RSS setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard
> +18446744073709551615)
> 11/26/2005 15:10:30 [1840:3600]: closing all filedescriptors
> 11/26/2005 15:10:30 [1840:3600]: further messages are in "error" and "trace"
> 11/26/2005 15:10:30 [1840:3600]: using stdout as stderr
> 11/26/2005 15:10:30 [1840:3600]: execvp(/usr/local/sge/default/spool/node112/job_scripts/649640, "/usr/local/sge/default/spool/node112/job_scripts/649640")
> 11/26/2005 15:11:27 [4399:3599]: wait3 returned -1
> 11/26/2005 15:11:27 [4399:3599]: queued signal CONT
> 11/26/2005 15:11:27 [4399:3599]: kill(-3600, CONT)
> 11/26/2005 15:11:27 [4399:3599]: now sending signal CONT to pid -3600
> 11/26/2005 15:12:27 [4399:3599]: wait3 returned -1
> 11/26/2005 15:12:27 [4399:3599]: queued signal CONT
> 11/26/2005 15:12:27 [4399:3599]: kill(-3600, CONT)
> 11/26/2005 15:12:27 [4399:3599]: now sending signal CONT to pid -3600
> 11/26/2005 15:13:27 [4399:3599]: wait3 returned -1
> 11/26/2005 15:13:27 [4399:3599]: queued signal CONT
> 11/26/2005 15:13:27 [4399:3599]: kill(-3600, CONT)
> 11/26/2005 15:13:27 [4399:3599]: now sending signal CONT to pid -3600
> [...same again every minute for hours and hours...]
> 
> -------------------------------------------------------------------------------
> 


-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Roland Dittel               Tel: +49 (0)941 3075-275 (x60275)
Software Engineering        Fax: +49 (0)941 3075-222 (x60222)
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7       mailto:roland.dittel at sun.com
D-93049 Regensburg          http://www.sun.com/gridware

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list