如何在spark SQL中从JSON列中提取动态数量的键值对

mbskvtky  于 4个月前  发布在  Spark
关注(0)|答案(1)|浏览(85)

我试图从我的JSON列中解析出test-N-value。我已经尝试过使用JavaScript,但创建的列在这里返回NULL。

SELECT get_json_object('{"random_field": "ABC", "test-0-value":"1", "test-1-value":"2"}', '$.test-*-value');

字符串
有没有人有任何提示,如何从中提取出这些动态对象?预期的结果将是..但请记住,输出必须允许动态数量的键值对。

extracted_out
-------------
{"test-0-value":"1", "test-1-value":"2"}

jjjwad0x

jjjwad0x1#

您可以使用from_json将json转换为map,分解map以获取键,值,仅过滤您需要的键,
收集key:value数组,将数组与逗号连接,使用str_to_map获取包含所需键的map,然后您可以选择使用to_json将其转换为JSON字符串
演示:

with src as (select '{"random_field": "ABC", "test-0-value":"1", "test-1-value":"2"}' as original_json)

select s.original_json, to_json( str_to_map(concat_ws(',',collect_list(concat(key,':',value))))) result
from src s
     lateral view  explode(from_json(s.original_json, 'map<STRING, STRING>'))  m as key, value
where key rlike 'test-\\d+-value'  
group by s.original_json
""").show(100, false)

字符串
测试结果:

+---------------------------------------------------------------+---------------------------------------+
|original_json                                                  |result                                 |
+---------------------------------------------------------------+---------------------------------------+
|{"random_field": "ABC", "test-0-value":"1", "test-1-value":"2"}|{"test-0-value":"1","test-1-value":"2"}|
+---------------------------------------------------------------+---------------------------------------+


更新:
使用map_filter找到更好的解决方案

spark.sql("""
with src as (select '{"random_field": "ABC", "test-0-value":"1", "test-1-value":"2"}' as original_json)

select s.original_json, to_json( map_filter(from_json(s.original_json, 'map<STRING, STRING>'), (k, v) -> k rlike 'test-\\d+-value')) result
from src s
""").show(100, false)

相关问题