[GE users] submit jobs between clusters

reuti reuti at staff.uni-marburg.de
Thu Aug 19 21:01:13 BST 2010


Hi,

Am 19.08.2010 um 21:44 schrieb russray:

> reuti <reuti at staff.uni-marburg.de> wrote on 08/18/2010 01:51:25 PM:
> 
>> Hi,
>> 
>> Am 18.08.2010 um 16:19 schrieb russray:
>> 
>>> I have setup two clusters that are independent of each other at 
>> different sites.   
>>> 
>>> Site 1:  running GE 6.1u3 
>>> Site 2:  running GE 6.2u2_1 
>>> 
>>> I would like to have users at Site 2 to be able to submit a job to
>> a queue on Site 1.  Can someone point me to a how-to or manual or 
>> something that would help me figure out how to have the clusters 
>> "see" each other so jobs could be submitted to the other site? 
>> 
>> which features do you need in detail - you have to distribute also 
>> large input-/outout-data? The users are known at both locations with
>> the same UID? An SSH connection exists?
>> 
>> -- Reuti
> 
> 1.  centralized authentication so users known by all machines at both locations 

for such a setup transfer queues might already be a solution:

http://gridengine.sunsource.net/howto/TransferQueues/transferqueues.html

(it is made for OGE 5.3, so it will need some rework to be usable in 6.x)

The idea is, that this transfer queue will always execute jobs immediately. But due to a starter method, it will change the ports if necessary, use a  different "common" directory and access the other cluster to do the real submission (in your case the 6.1u3 binaries need also to be available at the second site somewhere, as these will be used then for `qsub`). In this starter method you can also transfer the files if necessary.

We can discuss the details off-list.


> 2.  auto mounted home directories by all machines for all users 

But it's not the same /home, hence files must be transferred?

-- Reuti


> 3.  ssh used to login to a machine 
> 4.  jobs are single jobs, low data to start the job, then a compressed tar file to return to user 
> 5.  data can use cp and no ftp or scp necessary 
> 
> And if it matters, the clusters are small. 
> Site 1:  5 machines, 15 cpus 
> Site 2:  3 machines, 18 cpus 
> 
> Currently only one queue on Site 1 needs to be used from Site 2 and no jobs from Site 1 need to run on Site 2.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=275540

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list