一次可以向一个pig作业提交多少个文件?

nwlqm0z1  于 2021-06-25  发布在  Pig
关注(0)|答案(1)|浏览(256)

当我在脚本中指定一个大目录树的根作为加载输入时,pig神秘地失败了。它抛出的后端错误异常无法洞察发生了什么。当文件较少时,相同的脚本可以完美地工作。
这是一个非常简单的脚本,如下所示:

SET pig.noSplitCombination true;
raw_record = LOAD '/data/directory/tree/root' USING PigStorage(',');
filtered = FILTER raw_record by $1 == 251068;
filtered_data = FOREACH filtered GENERATE (chararray)$0, (chararray)$1, (chararray)$2;
STORE filtered_data INTO '/data/output/directory/' USING PigStorage();

下面是我看到的错误消息:

ERROR 2244: Job scope-594 failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job scope-594 failed, hadoop does not return any error message
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:178)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:232)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
    at org.apache.pig.Main.run(Main.java:608)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

pig一次可以处理多少个文件?

balp4ylt

balp4ylt1#

pig可以处理任意数量的文件,对pig的处理没有限制。在您的情况下,请尝试在加载时提供每个字段的数据类型,并尝试在filter语句中使用引号。
raw_record=load'/data/directory/tree/root',使用pigstorage(',')作为(col1:chararray,col2:chararray;
filtered=按$1筛选原始记录=='251068';
如果仍有错误,请尝试提供示例数据

相关问题