在apache pig中连接对象后出错

bwntbbo3  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(302)

我在pig中有两个数据对象。
数据表1:

col_a: chararray,
col_b: int,
col_c: int,
col_d: chararray

数据表2:

col_a: chararray,
col_b: chararray,
col_c: int,
col_d: int,
col_e: int

我想加入他们两个,我试过:

all_data = JOIN data_1 BY (col_a) LEFT, data_2 by (col_b);
all_data = JOIN data_1 BY (col_a), data_2 by (col_b);

当我尝试转储对象时(将其限制为10条记录后),两个选项都返回相同的错误:

Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: all_data_limit: Limit - scope-6383 Operator Key: scope-6383): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: all_data: New For Each(true,true)[tuple] - scope-6382 Operator Key: scope-6382): org.apache.pig.backend.executionengine.ExecException: ERROR 0: java.lang.ClassCastException: org.apache.pig.impl.io.NullableText cannot be cast to org.apache.pig.impl.io.NullableBytesWritable

两个对象的“descripe”(数据1,数据2)返回了良好的输出(我在顶部写的)
“describe”对于连接的对象-所有的\u数据,也给出了一个很好的输出,正如它应该的那样。
我打印了两个对象的限制10-他们有很好的数据。
我使用的是amazon集群“emr-5.2.0”,pig版本为0.16.0
我有点沮丧,找不到解决办法,我正在寻找一个3天了。。。任何帮助都会很好。谢谢!

r6hnlfcb

r6hnlfcb1#

使用以下命令

all_data = JOIN data_1 BY TRIM(col_a) LEFT, data_2 by TRIM(col_b);
all_data = JOIN data_1 BY TRIM(col_a), data_2 by TRIM(col_b);

让我知道它是否工作正常。

相关问题