hadoop:查询parquet with pig以获得tpc-h基准

voj3qocg  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(173)

我正在试着运行tpc-h基准测试(www.tpc.org)在hadoop上使用hortonworks数据平台(2.3.2)。因此,我想使用pig(版本0.15)以parquet文件格式查询数据并对其进行基准测试。我´我用tpch-gen创建了2 gb的Parquet文件格式的数据´我下载了一个PigParquet包读取Parquet文件使用Parquet装载机。我使用以下pig脚本来查询它:

REGISTER /opt/parquet-pig-bundle-1.8.1.jar;

lineitem = LOAD '$input/lineitem' 
using org.apache.parquet.pig.ParquetLoader AS (orderkey:long, partkey:long, suppkey:long,
linenumber:long, quantity:double, extendedprice:double, discount:double, tax:double, returnflag:chararray, linestatus:chararray,
shipdate:chararray, commitdate:chararray, receiptdate:chararray, shipinstruct:chararray, shipmode:chararray, comment:chararray);

SubLineItems = FILTER lineitem BY shipdate <= '1998-09-16';

SubLine = FOREACH SubLineItems GENERATE returnflag, linestatus, quantity, extendedprice, extendedprice*(1-discount) AS disc_price, extendedprice*(1-discount)*(1+tax) AS charge, discount;

STORE SubLine INTO '$output/Q1_out' USING org.apache.parquet.pig.ParquetStorer();

执行此查询时,出现以下错误:
2016-01-28 15:57:23974[main]info org.apache.pig.tools.pigstats.scriptstate-脚本中使用的pig功能:filter 2016-01-28 15:57:24035[main]info org.apache.pig.data.schematuplebackend-key[pig.schematuple]未设置。。。不会生成代码。2016-01-28 15:57:24098[main]info org.apache.pig.newplan.logical.optimizer.logicalplanoptimizer-{rules\u enabled=[addforeach,columnmapkeyprune,constantcalculator,groupbyconstparallelsetter,limitoptimizer,loadtypecastinerter,mergefilter,mergeforeach,partitionfilteroptimizer,predictepushdownownoptimizer,pushdownforeachflatten,pushupfilter,splitfilter,streamtypecastinerter]}2016-01-28 15:57:24155[main]info org.apache.pig.newplan.logical.rules.columnprunevisitor-为lineitem修剪的列:$0,$1,$2,$3,$4,$5,$6,$7,$9,$11,$12,$13,$14,$15 2016-01-28 15:57:24,160[main]error org.apache.pig.tools.grunt.grunt-错误2000:错误处理规则columnmapkeyprune。try-t columnmapkeyprune details at logfile:/root/d2f bench/bin/pig_.log
当我使用上面提到的-t columnmapkeyprune时,查询执行时没有出现错误,但是大约需要1个小时,这太长了。
我注意到,当我在pig查询中使用“foreach”时,错误就出现了,当我删除包含foreach的行时,错误就没有出现。此外,我´我已经尝试用相同的pig脚本来查询avro文件格式(只是改变了使用。。。行)这是工作良好。
有什么问题吗?提前谢谢。
p、 s.清管器堆栈跟踪提供以下信息:
错误2000:处理规则columnmapkeyprune时出错。try-t columnmapkeyprune命令
org.apache.pig.impl.logicalayer.frontendexception:错误2000:错误处理规则columnmapkeyprune。在org.apache.pig.newplan.optimizer.planoptimizer.optimize(planoptimizer。java:125)在org.apache.pig.newplan.logical.relational.logicalplan.optimize(logicalplan。java:277)在org.apache.pig.pigserver.executecompiledlogicalplan(pigserver。java:1373)在org.apache.pig.pigserver.execute(pigserver。java:1364)在org.apache.pig.pigserver.executebatch(pigserver。java:415)在org.apache.pig.pigserver.executebatch(pigserver。java:398)位于org.apache.pig.tools.grunt.gruntparser.executebatch(gruntparser.org)。java:171)在org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser。java:234)在org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser。java:205)在org.apache.pig.tools.grunt.grunt.exec(grunt。java:81)在org.apache.pig.main.run(main。java:502)在org.apache.pig.main.main(main。java:177)位于sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl。java:57)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(方法。java:606)在org.apache.hadoop.util.runjar.run(runjar。java:221)在org.apache.hadoop.util.runjar.main(runjar。java:136)原因:org.apache.parquet.pig.parquetloader.getschemafromrequiredfieldlist(parquetloader)上的java.lang.nullpointerexception。java:364)在org.apache.parquet.pig.parquetloader.pushprojection(parquetloader。java:346)访问org.apache.pig.newplan.logical.rules.columnprunevisitor.visit(columnprunevisitor。java:155)在org.apache.pig.newplan.logical.relational.loload.accept(loload。java:230)在org.apache.pig.newplan.reversedependencyorderwalker.walk(reversedependencyorderwalker。java:70)访问org.apache.pig.newplan.planvisitor.visit(planvisitor。java:52)在org.apache.pig.newplan.logical.rules.columnmapkeyprune$columnmapkeyprunetransformer.transform(columnmapkeyprune)。java:141)在org.apache.pig.newplan.optimizer.planoptimizer.optimize(planoptimizer。java:110)

... 还有17个

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题