[GE users] 6.2 sgeexecd fails to keep running: can't connect to service

reuti reuti at staff.uni-marburg.de
Wed Nov 19 00:18:46 GMT 2008


Hi,

Am 18.11.2008 um 21:06 schrieb Harry Mangalam:

> I have 2 subclusters (different archs) running under 6.2.  When I try
> to start sgeexecd on subcluster bduc-i32, sgeexecd starts and then
> fails after a minute or so.  The only message I can see is
> in /tmp/execd_messages.nnnnn:
>
> 11/18/2008 11:51:32|  main|bduc-i32-16|E|can't connect to service
> 11/18/2008 11:51:32|  main|bduc-i32-16|E|can't get configuration from
> qmaster -- backgrounding
>
> the bduc-amd64 subcluster (oddly, the one 'further away' on a public
> IP net) works fine and the output of qhost shows:
>
> HOSTNAME      ARCH       NCPU  LOAD MEMTOT  MEMUSE SWAPTO SWAPUS
> ----------------------------------------------------------------
> global        -             -     -      -       -      -      -
> bduc-amd64-1  lx24-amd64    2  0.00   3.9G  152.0M   1.0G    0.0
> bduc-amd64-10 lx24-amd64    2  0.00   2.0G  148.1M   1.0G    0.0
> bduc-amd64-11 lx24-amd64    2  0.00   3.9G  149.3M   1.0G    0.0
> bduc-amd64-12 lx24-amd64    2  0.00   3.9G  148.6M   1.0G    0.0
> bduc-amd64-13 lx24-amd64    2  0.00   3.9G  148.4M   1.0G    0.0
>  ...
> bduc-i32-10   lx24-x86      2     -   2.0G       -   3.9G      -
> bduc-i32-11   lx24-x86      2     -   4.0G       -   3.9G      -
> bduc-i32-12   lx24-x86      2     -   4.0G       -   3.9G      -
> bduc-i32-13   lx24-x86      2     -   4.0G       -   3.9G      -
> bduc-i32-14   lx24-x86      2     -   4.0G       -   3.9G      -
>
> indicating the failure of sgeexecd to run on the i32 nodes.

also the internal nodes will have to contact the qmaster under his  
external name. Maybe for now they can't find the qmaster - you will  
have to setup a route from the internal nodes to the qmaster.

I.e. a "ping <external_name_of _the qmaster>" should work on the  
internal nodes.

-- Reuti


> Is this sound like a name resolution problem?  Or something else?  No
> firewall are involved AFAIK.
>
> -- 
> Harry Mangalam - Research Computing, NACS, E2148, Engineering Gateway,
> UC Irvine 92697  949 824-0084(o), 949 285-4487(c)
> ---
> Good judgment comes from experience;
> Experience comes from bad judgment. [F. Brooks.]
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=88996
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89022

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list