[GE users] Can't connect to shepherd error

Heywood, Todd heywood at cshl.edu
Fri May 25 14:43:08 BST 2007


Hi,

We have a gid_range of 20000-20100, and allow only 4 jobs per node.

Does the execd look for an unused add_grp_id locally, or does it need to
contact the master host? My guess is that this error is a function of
certain loads on the cluster.

Thanks,

Todd


On 5/25/07 9:19 AM, "Fred Youhanaie" <fly at anydata.co.uk> wrote:

> Hi Todd,
> 
> It appears that your group id range is too short for the number of jobs
> you are running on the individuals nodes, see sge_conf man page,
> parameter gid_range.
> 
> You need to increase gid_range to a larger value, it should be greater
> than the number of concurrent jobs on a single node.
> 
> HTH
> 
> Cheers
> f.
> 
> Heywood, Todd wrote:
>> Hi,
>> 
>> A user is getting this sporadic error:
>> 
>>    error: cannot  get connection to "shepherd" at host "blade97"
>> 
>> When I look in /var/spool/sge/blade97/messages, I see this:
>> 
>> 05/24/2007 12:20:04|execd|blade97|W|reaping job "1407319" ptf complains: Job
>> does not exist
>> 05/24/2007 12:20:05|execd|blade97|E|can't start job "1407319": can not find
>> an unused add_grp_id
>> 05/24/2007 12:20:05|execd|blade97|E|can't start job "1407319": can not find
>> an unused add_grp_id
>> 05/24/2007 12:20:06|execd|blade97|E|can't start job "1407319": can not find
>> an unused add_grp_id
>> 05/24/2007 12:20:13|execd|blade97|W|reaping job "1407319" ptf complains: Job
>> does not exist 
>> 
>> 
>> Can anyone explain what this means and how it might be avoided? Thanks.
>> 
>> (On a related note, the message "reaping job... ptf complains: Job does not
>> exist" is very common in the message files... why is this?)
>> 
>> Thanks,
>> 
>> Todd
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list