[GE users] qlogin fails from ROCKS compute node to external SGE exec host

bergman mark.bergman at uphs.upenn.edu
Thu Nov 26 18:20:45 GMT 2009


In the message dated: Thu, 26 Nov 2009 08:56:01 +0100,
The pithy ruminations from reuti on 
<Re: [GE users] qlogin fails from ROCKS compute node to external SGE exec host>
 were:
=> Am 26.11.2009 um 02:49 schrieb bergman:
=> 
=> > In the message dated: Thu, 26 Nov 2009 00:44:35 +0100,
=> > The pithy ruminations from reuti on
=> > <Re: [GE users] qlogin fails from ROCKS compute node to external  
=> > SGE exec host>
=> >  were:
=> > => Am 25.11.2009 um 23:50 schrieb bergman:
=> > =>
=> >
=> > 	[SNIP!]
=> >
=> > =>
=> > => So, you login to a compute node and then issue qlogin from there to
=> > => an outside server - unusual setup. Aynway:
=> >
=> > Yeah, it's a bit unusual...but it has some advantages for  
=> > us...people login to
=> > the headnode, and then run qlogin...which directs them to a machine  
=> > in a queue
=> > that's reserved for interactive logins...so that interactive work  
=> > doesn't fight
=> > with compute jobs for resources.
=> >
=> > The additional qlogin to the machine with the GPU is an  
=> > exception...meant to
=> > give people doing development of GPU code interactive use to that  
=> > server.
=> >
=> > =>
=> > => Does server1 know the hosts inside the cluster, i.e. compute-0-0
=> >
=> > Hmmmm....I don't know how that would work, since the compute nodes are
=> > inaccessible from outside the ROCKS cluster--they're on an RFC1918  
=> > network.
=> 
=> Just fill in dummy entries in /etc/hosts on server1.
=> 

OK. I'll try that.

=> 
=> > => resolves to something? AFAIK SGE will check the address from the
=> > => incoming rsh or builtin method being originated from the issuing
=> > => machine (which would fail due to NAT). But as you are using SSH it
=> >
=> > I'm not sure what this means. Where is this address check happening:
=> >
=> > 	on the compute node when the job is submitted
=> >
=> > 	on the qmaster (which is also the ROCKS head node, so resolution
=> > 	will succeed)
=> >
=> > 	on the target of the command (server1)
=> 
=> It will happen on server1. And as you are using SSH, the check won't  
=> be performed but the starting shepherd will look for the addresses  
=> anyway I think. If it's still not working, you could install a second  

Hmmm...

=> network card into server1 so that it could also be inside the private  
=> network besides the external connection.

I understand that, but it's not scalable. In my example "server1" is a single
machine. In reality, it's a proof of concept, and in the future there may be
dozens-to-hundreds of servers that are outside the ROCKS cluster, but are
accessed exclusively via SGE (qsub and qlogin).

Thank you very much for the quick answers.

Mark

=> 
=> -- Reuti
=> 
=> 
=> > Will the $SGE_ROOT/default/common/host_aliases file help?
=> >
=> > => should work. As the shepherd startup will try to resolve  
=> > compute-0-0
=> > => although it's not needed, it might hang at that point.
=> > =>
=> > => -- Reuti
=> > =>
=> > => PS: qsub is different, as it doesn't need a direct connection  
=> > between
=> > => the issuing and executing machine at any point.
=> >
=> > Ah. Ok.
=> >
=> >
=> > Thanks,
=> >
=> > Mark
=> >
=> > =>
=> > =>
=> > => > Thanks,
=> > => >
=> > => > Mark
=> > => >
=> > => >
=> > => > ----
=> > => > Mark Bergman                              voice: 215-662-7310
=> > => > mark.bergman at uphs.upenn.edu                 fax: 215-614-0266
=> > => > System Administrator     Section of Biomedical Image Analysis
=> > => > Department of Radiology            University of Pennsylvania
=> > => >       PGP Key: https://www.rad.upenn.edu/sbia/bergman
=> > => >
=> >
=> > ------------------------------------------------------
=> > http://gridengine.sunsource.net/ds/viewMessage.do? 
=> > dsForumId=38&dsMessageId=229432
=> >
=> > To unsubscribe from this discussion, e-mail: [users- 
=> > unsubscribe at gridengine.sunsource.net].
=> 
=> ------------------------------------------------------
=> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=229480
=> 
=> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
=>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=229601

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list