[GE users] Can't connect to shepherd error

Rayson Ho rayrayson at gmail.com
Fri May 25 15:48:18 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

On 5/25/07, Heywood, Todd <heywood at cshl.edu> wrote:
> Does the execd look for an unused add_grp_id locally, or does it need to
> contact the master host?

The group ID is used locally, it just needs to be unique within the host.

Rayson


>  My guess is that this error is a function of
> certain loads on the cluster.
>
> Thanks,
>
> Todd
>
>
> On 5/25/07 9:19 AM, "Fred Youhanaie" <fly at anydata.co.uk> wrote:
>
> > Hi Todd,
> >
> > It appears that your group id range is too short for the number of jobs
> > you are running on the individuals nodes, see sge_conf man page,
> > parameter gid_range.
> >
> > You need to increase gid_range to a larger value, it should be greater
> > than the number of concurrent jobs on a single node.
> >
> > HTH
> >
> > Cheers
> > f.
> >
> > Heywood, Todd wrote:
> >> Hi,
> >>
> >> A user is getting this sporadic error:
> >>
> >>    error: cannot  get connection to "shepherd" at host "blade97"
> >>
> >> When I look in /var/spool/sge/blade97/messages, I see this:
> >>
> >> 05/24/2007 12:20:04|execd|blade97|W|reaping job "1407319" ptf complains: Job
> >> does not exist
> >> 05/24/2007 12:20:05|execd|blade97|E|can't start job "1407319": can not find
> >> an unused add_grp_id
> >> 05/24/2007 12:20:05|execd|blade97|E|can't start job "1407319": can not find
> >> an unused add_grp_id
> >> 05/24/2007 12:20:06|execd|blade97|E|can't start job "1407319": can not find
> >> an unused add_grp_id
> >> 05/24/2007 12:20:13|execd|blade97|W|reaping job "1407319" ptf complains: Job
> >> does not exist
> >>
> >>
> >> Can anyone explain what this means and how it might be avoided? Thanks.
> >>
> >> (On a related note, the message "reaping job... ptf complains: Job does not
> >> exist" is very common in the message files... why is this?)
> >>
> >> Thanks,
> >>
> >> Todd
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list