Discussion:
hadoop distcp from brisk cluster to hadoop cluster
rk vishu
2012-02-10 19:56:20 UTC
Permalink
Could any one tell me how can we copy data from Cassandra-Brisk cluster to
Hadoop-HDFS cluster?

1) Is there a way to do hadoop distcp between clusters?
2) If hive table is created on Brisk cluster, will it similar like HDFS
file format? can we run map reduce on the other cluster to transform hive
data (on brisk)?

Thanks and Regards
RK
Edward Capriolo
2012-02-11 13:59:49 UTC
Permalink
It mostly works as normal with one caveat.
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/possibly_the_worlds_first_briskcp

In the other direction hadoop may not know how to "talk" to cfs:/// without
having to install extra stuff. So this is where htfp:// comes in...

Copying between versions of HDFS

For copying between two different versions of Hadoop, one will usually use
HftpFileSystem. This is a read-only FileSystem, so DistCp must be run on
the destination cluster (more specifically, on TaskTrackers that can write
to the destination cluster). Each source is specified as
hftp://<dfs.http.address>/<path> (the default dfs.http.address is
<namenode>:50070).
Also distcp can push or pull data so usually you have a few options.
Post by rk vishu
Could any one tell me how can we copy data from Cassandra-Brisk cluster to
Hadoop-HDFS cluster?
1) Is there a way to do hadoop distcp between clusters?
2) If hive table is created on Brisk cluster, will it similar like HDFS
file format? can we run map reduce on the other cluster to transform hive
data (on brisk)?
Thanks and Regards
RK
Talk2hadoop
2012-02-11 20:45:56 UTC
Permalink
Edward,

Is distcp from csf cluster to Hadoop cluster (push) is going to work similar to the example given?.

Rk
Post by Edward Capriolo
It mostly works as normal with one caveat.
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/possibly_the_worlds_first_briskcp
In the other direction hadoop may not know how to "talk" to cfs:/// without having to install extra stuff. So this is where htfp:// comes in...
Copying between versions of HDFS
For copying between two different versions of Hadoop, one will usually use HftpFileSystem. This is a read-only FileSystem, so DistCp must be run on the destination cluster (more specifically, on TaskTrackers that can write to the destination cluster). Each source is specified as hftp://<dfs.http.address>/<path> (the default dfs.http.address is <namenode>:50070).
Also distcp can push or pull data so usually you have a few options.
Could any one tell me how can we copy data from Cassandra-Brisk cluster to Hadoop-HDFS cluster?
1) Is there a way to do hadoop distcp between clusters?
2) If hive table is created on Brisk cluster, will it similar like HDFS file format? can we run map reduce on the other cluster to transform hive data (on brisk)?
Thanks and Regards
RK
Loading...