在google文件系统中，hadoop分布式文件系统的分布式缓存有什么类似的功能

1cosmwyk 于 2021-05-30 发布在 Hadoop

关注(0)|答案(1)|浏览(311)

我在google计算引擎中部署了一个6节点hadoop集群。
我使用的是google文件系统（gfs）而不是hadoop文件分发系统（hfs）。
. 所以，我想以与hdfs中的分布式缓存方法相同的方式访问gfs中的文件
请告诉我这样存取文件的方法。

hadoop hdfs distributed-cache gfs google-compute-engine

来源：https://stackoverflow.com/questions/27136629/what-is-the-similar-function-to-distributed-cache-of-hadoop-distribution-file-sy

1条答案

按热度按时间

gdrx4gfi1#

在google compute engine上运行hadoop时，将google cloud storage connector for hadoop作为“默认文件系统”，gcs连接器可以完全按照对待hdfs的方式进行处理，包括在distributedcache中使用。所以，要访问google云存储中的文件，您可以完全像使用hdfs一样使用它，无需更改任何内容。例如，如果您使用gcs连接器部署了集群 CONFIGBUCKET 设置为 foo-bucket ，并且您有本地文件要放在distributedcache中，您可以执行以下操作：


# Copies mylib.jar into gs://foo-bucket/myapp/mylib.jar

$ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar

在你的hadoop工作中：

JobConf job = new JobConf();

// Retrieves gs://foo-bucket/myapp/mylib.jar as a cached file.
DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);

如果您想访问不同于 CONFIGBUCKET ，只需使用 gs:// 而不是 hdfs:// :


# Copies mylib.jar into gs://other-bucket/myapp/mylib.jar

$ bin/hadoop fs -copyFromLocal mylib.jar gs://other-bucket/myapp/mylib.jar

然后是java

JobConf job = new JobConf();

// Retrieves gs://other-bucket/myapp/mylib.jar as a cached file.
DistributedCache.addFileToClassPath(new Path("gs://other-bucket/myapp/mylib.jar"), job);

赞(0）回复(0）举报 2021-05-30

我来回答

在google文件系统中，hadoop分布式文件系统的分布式缓存有什么类似的功能

1条答案

相关问题

热门标签

最新问答