hadoop—在pig udf java类amazon emr中从分布式缓存访问文件

vtwuwzda 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(280)

我正在尝试访问udf中的一个文件（sample.txt）。我想把这个文件放到分布式缓存中，然后从那里使用它。我用亚马逊电子病历运行Pig的工作。在创建集群时，我使用emr引导操作将文件（sample.txt）复制到hdfs。
sh（将文件从s3复制到hdfs）

hadoop fs -copyToLocal s3n://s3_path/sample.txt /mnt/sample.txt

使用sample.java（使用sample.txt的udf）

public class UsingSample extends EvalFunc<String>{

public String useSampleText(String str) throws Exception{
    File  sampleFile = new File(“./sample”);

    //do something with sampleFile

}

@Override
public String exec(Tuple input) throws IOException {
    if (input == null || input.size() == 0)
        return null;

    String str = (String) input.get(0);
    String result = "";
    try {
        result = useSampleText(str);
    } catch (Exception e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return result;
}

public List<String> getCacheFiles() { 
   List<String> list = new ArrayList<String>(1); 
   list.add("/mnt/sample.txt#sample"); // not sure if the path I am passing is correct
   return list; 
}

}
create_cluster.sh（创建集群并执行pig脚本的脚本）

aws emr create-cluster 

--auto-terminate 

--name "sample cluster" 

--ami-version 3.8.0  

--enable-debugging 

--applications Name=Pig 

--use-default-roles 

--instance-type m1.large 

--instance-count 3 

--steps Type=PIG,Name="Pig Program",ActionOnFailure=CONTINUE,Args=[-f,$S3_PIG_SCRIPT_URL,-p,INPUT=$INPUT,-p,OUTPUT=$OUTPUT] 

--bootstrap-action Path=s3://s3_bootstrapscript_path/bootstrap.sh

当尝试访问getcachefiles（）中的sample.txt时，我得到的错误是filenotfound exception。
我正在使用： Hadoop 2.4 Pig 0.12 请帮忙。

hadoop udf amazon-emr apache-pig distributed-cache

来源：https://stackoverflow.com/questions/31506053/accessing-a-file-from-distributed-cache-in-pig-udf-java-class-amazon-emr

1条答案

按热度按时间

tvmytwxo1#

尝试使用以下命令将文件复制到hdfs：

Hadoop distcp s3n://bucket/file /home/filelocation

然后使用以下方法检查hdfs上是否存在该文件：

hdfs dfs -ls /home/filelocation

赞(0）回复(0）举报 2021-05-30

我来回答

hadoop—在pig udf java类amazon emr中从分布式缓存访问文件

1条答案

相关问题

热门标签

最新问答