[GE users] Working of SGE_Hadoop Integration

adarsh adarsh.sharma at orkash.com
Mon Dec 6 08:44:49 GMT 2010

Dear all,

Thanks for your replies so that I am able to configure Hadoop with SGE integrated on 10 nodes cluster.

I overcomed all the difficulties faced during Configuration.

Yet there are some doubts in my mind.

1. I loaded data of different types in Hadoop (  24MB, 2 GB, 20 GB file ). When i issued a command ./qhost -F | grep hdfs, it shows data paths. But when I ran any SGE job on these types of data files,it executes on  only 1 execution daemon.

It is good for small files, but for 20 Gb file, data is distributed on 10 nodes. So it might runs all tasktrackers for running wordcount. But it shows only one execution daemon.
I check through Web UI and logs, only one execution daemon is running.

It causes data transfer to one node which takes too much time.

What is the benefit, Hadoop made for distribution processing.

Is it our configuration problem ( I configured all.q to all execution daemons )
Is it possible to run a job on several hosts concurrently ( Hadoop is used for ) though single or different queues.

Thanks & Regards
Adarsh Sharma


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list