[GE users] Shepherd errors

Margaret Doll Margaret_Doll at brown.edu
Fri Jun 29 19:05:09 BST 2007


How do you make the  switch  to using local /var/spool/sge files?

On Jun 29, 2007, at 4:24 AM, Schenker, Martin wrote:

> Are you using a central spool dir or a local one (on each node)? We  
> had similar random occurences on a Lustre/SFS  system, but  
> switching the spooling to a local dir (/var/spool/sge) seems to  
> have cured this very odd behaviour...
>
> Best, Martin
>
>
>
> -----Original Message-----
> From: Heywood, Todd [mailto:heywood at cshl.edu]
> Sent: 28 June 2007 19:21
> To: users at gridengine.sunsource.net
> Subject: [GE users] Shepherd errors
>
>
> I have a recurrent error which is drving me nuts. It occurs for a  
> pipeline
> application which submits thousands of jobs for over a 6 hour period.
> Sometimes the pipeline finishes fine, and other times it stops with  
> this
> error:
>
> In stderr:
>
> error: cannot get connection to "shepherd" at host "blade15"
>
> In email sent to the SGE admin user:
>
> failed before job:06/27/2007 02:57:34 [0:23701]: can't open file
> /tmp/1738723.1.solexa.q/pid.339.blade15: No such file
>
> In /var/spool/sge/blade15/messages:
>
> 06/27/2007 02:57:34|execd|blade15|E|slave shepherd of job 1738723.1  
> exited
> with exit status = 11
>
> This jjust doesn't happen with blade15, but some random node.  
> Further, this
> node has been happily processing pipeline jobs for hours up until this
> failure.
>
> Any ideas on how to diagnose this further?
>
> Thanks,
>
> Todd
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list