当我在脚本中指定一个大目录树的根作为加载输入时,pig神秘地失败了。它抛出的后端错误异常无法洞察发生了什么。当文件较少时,相同的脚本可以完美地工作。
这是一个非常简单的脚本,如下所示:
SET pig.noSplitCombination true;
raw_record = LOAD '/data/directory/tree/root' USING PigStorage(',');
filtered = FILTER raw_record by $1 == 251068;
filtered_data = FOREACH filtered GENERATE (chararray)$0, (chararray)$1, (chararray)$2;
STORE filtered_data INTO '/data/output/directory/' USING PigStorage();
下面是我看到的错误消息:
ERROR 2244: Job scope-594 failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job scope-594 failed, hadoop does not return any error message
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:178)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:232)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:608)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
pig一次可以处理多少个文件?
1条答案
按热度按时间balp4ylt1#
pig可以处理任意数量的文件,对pig的处理没有限制。在您的情况下,请尝试在加载时提供每个字段的数据类型,并尝试在filter语句中使用引号。
raw_record=load'/data/directory/tree/root',使用pigstorage(',')作为(col1:chararray,col2:chararray;
filtered=按$1筛选原始记录=='251068';
如果仍有错误,请尝试提供示例数据