[GE users] Can't connect to shepherd error

Heywood, Todd heywood at cshl.edu
Fri May 25 17:07:29 BST 2007


I'll give it a shot, and set it to 20000-20200. No suspended jobs on the
node. 

This particular qpplication uses qmake with 100-200 tasks. But it has been
running without hitting this error for a few months now, until this week.

Todd 


On 5/25/07 10:59 AM, "Fred Youhanaie" <fly at anydata.co.uk> wrote:

> Heywood, Todd wrote:
>> Hi,
>> 
>> We have a gid_range of 20000-20100, and allow only 4 jobs per node.
>> 
>> Does the execd look for an unused add_grp_id locally, or does it need to
>> contact the master host? My guess is that this error is a function of
>> certain loads on the cluster.
> 
> The range is meant to be per node/execd, however, having had a quick
> look at the source code, it may actually be per cluster. Perhaps someone
> from dev team can confirm :)
> 
> If you are able to extend the range, then it may well be worth a shot.
> It should take effect immediately.
> 
> BTW, you don't have loads of suspended jobs on the node, do you?
> 
> Cheers
> f.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list