[GE users] Unused add_grp_id error? Cannot get connection error?

Heywood, Todd heywood at cshl.edu
Fri Dec 14 15:41:31 GMT 2007




On 12/14/07 10:24 AM, "Rayson Ho" <rayrayson at gmail.com> wrote:

> On Dec 14, 2007 9:57 AM, Heywood, Todd <heywood at cshl.edu> wrote:
>> error: cannot get connection to "shepherd" at host "blade14"
>> 
>> When I go to look at the /var/spool/sge/blade14/messages file (for example),
>> there are 4 of these messages:
>> 
>> can't start job "4353897": can not find an unused add_grp_id
> 
> What is the "gid_range" of the host??
> 
> Also see the sge_conf(5) manpage.
> 
> Rayson

"qconf -sconf global" shows:

gid_range                    20000-20100

"qconf -sconf blade14" (for the exec host) shows no gid_range setting. It is
my understanding it comes from the global setting of 20000-20100.



> 
> 
> 
>> 
>> OK. But my sge_conf has gid_range set to 20000-20100, and only 4 jobs are
>> allowed to run per blade/node (hence 4 messages in the /var/spool/...
>> messages file).
>> 
>> So I look in the qmaster .../spool/qmaster/messages file, and see this:
>> 
>> 12/13/2007 23:02:15|qmaster|bhmnode2|E|tightly integrated parallel task
>> 4353897.1 task
>> 2623.blade7 failed - killing job
>> 
>> This time stamp is a couple of minutes after the other errors (same job ID).
>> 
>> So I go to blade7, and the messages file there says:
>> 
>> 12/13/2007 23:00:36|execd|blade7|E|slave shepherd of job 4353897.1 exited
>> with exit status = 11
>> 12/13/2007 23:00:36|execd|blade7|E|slave shepherd of job 4353897.1 exited
>> with exit status = 11
>> 
>> SO, I'm totally confused. Any idea what is going on?
>> 
>> Thanks,
>> 
>> Todd Heywood
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list