[GE users] wait3 returned -1 / now sending signal CONT to pid -3600

Reuti reuti at staff.uni-marburg.de
Wed Nov 30 21:14:28 GMT 2005


Hi Sebastian,

can you please check with "ps -e f -o pid,ppid,pgrp,command" what is  
running on this node112? And/or is in the messages file something  
linke mentioned here:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=1665

This is fixed in u6.

Cheers - Reuti


Am 30.11.2005 um 16:02 schrieb Sebastian Stark:

>
> I'm getting lots of these errors. What could be the cause? It only  
> happens with some jobs.
>
> Thanks for any hints.
>
>
> -Sebastian
>
> ---------------------------------------------------------------------- 
> ---------
> To: stark at tuebingen.mpg.de
> Subject: SGE 6.0u4: Job 649640 failed
> From: <>
>
> Job 649640 caused action: none
>  User        = schweike
>  Queue       = all.q at node112
>  Host        = node112
>  Start Time  = <unknown>
>  End Time    = <unknown>
> failed before writing exit_status:shepherd exited with exit status 19
> Shepherd trace:
> 11/26/2005 15:10:29 [4399:3599]: shepherd called with uid = 0, euid  
> = 4399
> 11/26/2005 15:10:30 [4399:3599]: setpgid(3599, 3599) returned 0
> 11/26/2005 15:10:30 [4399:3599]: no prolog script to start
> 11/26/2005 15:10:30 [4399:3600]: pid=3600 pgrp=3600 sid=3600 old  
> pgrp=3599 getlogin()=<no login set>
> 11/26/2005 15:10:30 [4399:3599]: forked "job" with pid 3600
> 11/26/2005 15:10:30 [4399:3599]: child: job - pid: 3600
> 11/26/2005 15:10:30 [4399:3600]: setosjobid: uid = 0, euid = 4399
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_CPU setting: (soft  
> 18446744073709551615 hard 18446744073709551615) resulting: (soft  
> 18446744073709551615 hard
> +18446744073709551615)
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_FSIZE setting: (soft  
> 18446744073709551615 hard 18446744073709551615) resulting: (soft  
> 18446744073709551615 hard
> +18446744073709551615)
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_DATA setting: (soft  
> 6291456000 hard 6291456000) resulting: (soft 6291456000 hard  
> 6291456000)
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_STACK setting: (soft  
> 8388608 hard 8388608) resulting: (soft 8388608 hard 8388608)
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_CORE setting: (soft 0 hard  
> 0) resulting: (soft 0 hard 0)
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_VMEM/RLIMIT_AS setting:  
> (soft 6291456000 hard 6291456000) resulting: (soft 6291456000 hard  
> 6291456000)
> 11/26/2005 15:10:30 [4399:3600]: RLIMIT_RSS setting: (soft  
> 18446744073709551615 hard 18446744073709551615) resulting: (soft  
> 18446744073709551615 hard
> +18446744073709551615)
> 11/26/2005 15:10:30 [1840:3600]: closing all filedescriptors
> 11/26/2005 15:10:30 [1840:3600]: further messages are in "error"  
> and "trace"
> 11/26/2005 15:10:30 [1840:3600]: using stdout as stderr
> 11/26/2005 15:10:30 [1840:3600]: execvp(/usr/local/sge/default/ 
> spool/node112/job_scripts/649640, "/usr/local/sge/default/spool/ 
> node112/job_scripts/649640")
> 11/26/2005 15:11:27 [4399:3599]: wait3 returned -1
> 11/26/2005 15:11:27 [4399:3599]: queued signal CONT
> 11/26/2005 15:11:27 [4399:3599]: kill(-3600, CONT)
> 11/26/2005 15:11:27 [4399:3599]: now sending signal CONT to pid -3600
> 11/26/2005 15:12:27 [4399:3599]: wait3 returned -1
> 11/26/2005 15:12:27 [4399:3599]: queued signal CONT
> 11/26/2005 15:12:27 [4399:3599]: kill(-3600, CONT)
> 11/26/2005 15:12:27 [4399:3599]: now sending signal CONT to pid -3600
> 11/26/2005 15:13:27 [4399:3599]: wait3 returned -1
> 11/26/2005 15:13:27 [4399:3599]: queued signal CONT
> 11/26/2005 15:13:27 [4399:3599]: kill(-3600, CONT)
> 11/26/2005 15:13:27 [4399:3599]: now sending signal CONT to pid -3600
> [...same again every minute for hours and hours...]
>
> ---------------------------------------------------------------------- 
> ---------
>
> -- 
> Sebastian Stark -- http://www.kyb.tuebingen.mpg.de/~stark
> Max Planck Institute for Biological Cybernetics
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list