hive转换接收串联数组值的null

1l5u6lss 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(347)

我有一个配置单元表，格式如下：

col1.      col2.     col3.
    a1          b1       c1
    a1          b1       c2                                  
    a1          b2       c2
    a1          b2       c3              
    a2          b3       c1
    a2          b4       c1                                  
    a2          b4       c2
    a2          b4       c3              
    .
    .

col1中的每个值在col2中都可以有多个值，并且每对（col1，col2）都可以有多个col3值。
我正在运行查询[q]：

select col1, col2, collect_list(col3) from {table} group by col1, col2;

得到：

a1   b1   [c1, c2]
a1   b2   [c2, c3]
a2   b3   [c1]
a2   b4   [c1, c2, c3]

我想使用python自定义项进行一些转换。因此，我使用transform子句将所有这些列传递给udf，如下所示：

select TRANSFORM ( * ) using 'python udf.py' FROM 
(
select col1, col2, concat_ws('\t', collect_list(col3)) from {table} group by col1, col2;
)

我使用concat\u ws将数组输出转换为由分隔符连接的collect\u列表中的strig。我在结果中得到col1，col2，但是没有得到col3输出。

+---------+---------+
|      key|    value|
+---------+---------+
|a1       | b1      |
|         |     null|
|a1       | b2      |
|         |     null|
|a2       | b3      |
|         |     null|
|a2       | b4      |
|         |     null|
+---------+---------+

在我的自定义项中，我只有一个print语句，它打印从stdin接收的行。

import sys
for line in sys.stdin:
    try:
        print line
    except Exception as e:
        continue

有人能帮我弄明白为什么我的自定义项中没有col3吗？

Hive apache-spark user-defined-functions hiveql hive-udf

来源：https://stackoverflow.com/questions/63512409/hive-transform-receives-null-for-concatenated-array-values

1条答案

按热度按时间

zengzsys1#

首先，您需要在python udf中解析行，例如。，

import sys
for line in sys.stdin:
    try:
        line = line.strip('\n')
        col1, col2, col3 = line.split('\t')
        print '\t'.join([col1, col2, col3])
    except Exception as e:
        continue

那最好用别的东西代替 \t 在concat\ws中

select TRANSFORM ( * )  using 'python udf.py' as (col1, col2, col3)
FROM 
(
select col1, col2, concat_ws(',', collect_list(col3)) from {table} group by col1, col2;

赞(0）回复(0）举报 2021-05-27

我来回答

hive转换接收串联数组值的null

1条答案

相关问题

热门标签

最新问答