[GE users] qlogin fails from ROCKS compute node to external SGE exec host

reuti reuti at staff.uni-marburg.de
Thu Nov 26 07:56:01 GMT 2009


Am 26.11.2009 um 02:49 schrieb bergman:

> In the message dated: Thu, 26 Nov 2009 00:44:35 +0100,
> The pithy ruminations from reuti on
> <Re: [GE users] qlogin fails from ROCKS compute node to external  
> SGE exec host>
>  were:
> => Am 25.11.2009 um 23:50 schrieb bergman:
> =>
>
> 	[SNIP!]
>
> =>
> => So, you login to a compute node and then issue qlogin from there to
> => an outside server - unusual setup. Aynway:
>
> Yeah, it's a bit unusual...but it has some advantages for  
> us...people login to
> the headnode, and then run qlogin...which directs them to a machine  
> in a queue
> that's reserved for interactive logins...so that interactive work  
> doesn't fight
> with compute jobs for resources.
>
> The additional qlogin to the machine with the GPU is an  
> exception...meant to
> give people doing development of GPU code interactive use to that  
> server.
>
> =>
> => Does server1 know the hosts inside the cluster, i.e. compute-0-0
>
> Hmmmm....I don't know how that would work, since the compute nodes are
> inaccessible from outside the ROCKS cluster--they're on an RFC1918  
> network.

Just fill in dummy entries in /etc/hosts on server1.


> => resolves to something? AFAIK SGE will check the address from the
> => incoming rsh or builtin method being originated from the issuing
> => machine (which would fail due to NAT). But as you are using SSH it
>
> I'm not sure what this means. Where is this address check happening:
>
> 	on the compute node when the job is submitted
>
> 	on the qmaster (which is also the ROCKS head node, so resolution
> 	will succeed)
>
> 	on the target of the command (server1)

It will happen on server1. And as you are using SSH, the check won't  
be performed but the starting shepherd will look for the addresses  
anyway I think. If it's still not working, you could install a second  
network card into server1 so that it could also be inside the private  
network besides the external connection.

-- Reuti


> Will the $SGE_ROOT/default/common/host_aliases file help?
>
> => should work. As the shepherd startup will try to resolve  
> compute-0-0
> => although it's not needed, it might hang at that point.
> =>
> => -- Reuti
> =>
> => PS: qsub is different, as it doesn't need a direct connection  
> between
> => the issuing and executing machine at any point.
>
> Ah. Ok.
>
>
> Thanks,
>
> Mark
>
> =>
> =>
> => > Thanks,
> => >
> => > Mark
> => >
> => >
> => > ----
> => > Mark Bergman                              voice: 215-662-7310
> => > mark.bergman at uphs.upenn.edu                 fax: 215-614-0266
> => > System Administrator     Section of Biomedical Image Analysis
> => > Department of Radiology            University of Pennsylvania
> => >       PGP Key: https://www.rad.upenn.edu/sbia/bergman
> => >
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=229432
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=229480

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list