[GE users] Data transfer between submit and compute host

reuti reuti at staff.uni-marburg.de
Sat Oct 2 21:39:31 BST 2010


Am 01.10.2010 um 15:44 schrieb mdondrup:

> We are adopting GE in a data-intensive web-services environment.
> I appreciate your opinion and experience on how to best transfer data between submit- and exec-hosts. Data volume will be in the range of 
> 100 MB to 1 GB, and it will be rather variable data, not so good to be cached. Number of compute hosts will be < 10. Job submission
> will be via DRMAA using different programming languages.
> The following solutions came up:
> - NFS shared directory (seems feasible with not too many compute hosts access at the same time)
> - Storing files in BLOB fields in a relational database (MySQL/Posgress) having the script access it,
> I guess this is rather inefficient, but need some arguments why.
> - I think I heard somewhere that GE has some built-in functionality for data-transfer, but I cannot recall where that was documented.

it's not buitlin as a feature, but as support for your scripts only. One can only be used by DRMAA, and the actual data you have to transfer on our own in bothb cases:


or in more general:


As the prolog runs on the exechost, it must be able to access the submit host.

-- Reuti

> - CacheFS? Is it worth it even bothering with variable data?
> Any input on pros and cons welcome
> Cheers  
> Michael
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=284905
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list