在pig中尝试使用python udf时,无法存储别名c

soat7uwm  于 2021-05-29  发布在  Hadoop
关注(0)|答案(3)|浏览(345)

我的python udf代码:


# commaFormat- format a number with commas, 12345-> 12,345

 @outputSchema("numformat:chararray")
 def commaFormat(num):
   return '{:,}'.format(num)

我的Pig剧本:

DEFINE CSVExcelStorage org.apache.pig.piggybank.storage.CSVExcelStorage;
A = LOAD '/result.csv' using CSVExcelStorage() As (id:int,lastvisitedtime:chararray,title:chararray,typedcount:int,URL:chararray,visitcount:int,bytes:int);
B = limit A 15;
REGISTER '/data/pyudf/test.py' USING streaming_python AS myudfs;
C = FOREACH B generate myudfs.commaFormat($1);

清管器堆栈跟踪:

pokxtpni

pokxtpni1#

首先,define语句中缺少()。

REGISTER /path/piggybank.jar;
DEFINE CSVExcelStorage org.apache.pig.piggybank.storage.CSVExcelStorage();

您可能正在使用mortar的cpython分布,它至少需要pig0.12。尝试使用jython脚本引擎。

REGISTER '/data/pyudf/test.py' USING jython AS myudfs;
C = FOREACH B generate myudfs.commaFormat($1);

或者,您可以使用replace函数轻松地删除逗号,而不是编写自定义项。

REGISTER /path/piggybank.jar;
DEFINE CSVExcelStorage org.apache.pig.piggybank.storage.CSVExcelStorage();
A = LOAD '/result.csv' using CSVExcelStorage() AS (id:int,lastvisitedtime:chararray,title:chararray,typedcount:int,URL:chararray,visitcount:int,bytes:int);
B = FOREACH A GENERATE id,REPLACE(lastvisitedtime,',',''),title,typedcount,URL,visitcount,bytes;
C = LIMIT B 15;
DUMP C;
tpgth1q7

tpgth1q72#

-----错误1002:无法存储别名c
org.apache.pig.impl.logicalayer.frontendexception:错误1066:无法打开org.apache.pig.pigserver.openiterator(pigserver)上别名c的迭代器。java:1019)位于org.apache.pig.tools.grunt.gruntparser.processdump(gruntparser。java:747)在org.apache.pig.tools.pigscript.parser.pigscriptparser.parse(pigscriptparser。java:376)在org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser。java:231)在org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser。java:206)在org.apache.pig.tools.grunt.grunt.exec(grunt。java:81)在org.apache.pig.main.run(main。java:630)在org.apache.pig.main.main(main。java:176)在sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)位于sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl)。java:57)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(方法。java:606)在org.apache.hadoop.util.runjar.run(runjar。java:221)在org.apache.hadoop.util.runjar.main(runjar。java:136)原因:org.apache.pig.pigeexception:错误1002:无法在org.apache.pig.pigserver.storeex(pigserver)中存储别名c。java:1122)在org.apache.pig.pigserver.store(pigserver。java:1081)在org.apache.pig.pigserver.openiterator(pigserver。java:994) ... 13其他原因:org.apache.pig.backend.executionengine.executexception:错误0:执行时出现异常(名称:c:store)(hdfs://localhost:54310/tmp/temp1063554930/tmp-651585063:org.apache.pig.impl.io.interstorage)-scope-16运算符键:scope-16):org.apache.pig.impl.streamingudfexception:line:keyerror:'concatmult4'
在org.apache.pig.backend.hadoop.executionengine.physicallayer.physicaloperator.processinput(physicaloperator。java:314)在org.apache.pig.backend.hadoop.executionengine.physicallayer.relationaloperators.postore.getnexttuple(postore。java:159)位于org.apache.pig.backend.hadoop.executionengine.fetch.fetchlauncher.runpipeline(fetchlauncher)。java:157)在org.apache.pig.backend.hadoop.executionengine.fetch.fetchlauncher.launchpig(fetchlauncher)。java:81)在org.apache.pig.backend.hadoop.executionengine.hexecutionengine.launchpig(hexecutionengine。java:306)在org.apache.pig.pigserver.launchplan(pigserver。java:1474)在org.apache.pig.pigserver.executecompiledlogicalplan(pigserver。java:1459)在org.apache.pig.pigserver.storeex(pigserver。java:1118) ... 还有15个原因:org.apache.pig.impl.streaming.streamingufexception:line:keyerror:'concatmult4'
在org.apache.pig.impl.builtin.streamingudf$processerrorthread.run(streamingudf。java:503)

cotxawn7

cotxawn73#

pig不能处理python udf,它带来了依赖模块。因此,您需要将它们 Package 在一个jar中,并将该文件注册为pig脚本的一部分。

REGISTER '/data/pyudf/test.py' USING jython AS myudfs;

python自定义项解释

相关问题