如何处理列值中的分隔符？

ewm0tg9j 于 2021-07-15 发布在 Hadoop

关注(0)|答案(1)|浏览(331)

我试图将csv文件数据加载到配置单元表中，但它在一列的值中有分隔符（，），因此配置单元将它作为分隔符并将其加载到新列中。我尝试使用转义序列，但我也尝试了转义序列，它不起作用，总是在新的列中加载数据。
我的csv文件：

id,name,desc,per1,roll,age
        226,a1,"\"double bars","item1 and item2\"",0.0,10,25
        227,a2,"\"doubles","item2 & item3 item4\"",0.1,20,35
        228,a3,"\"double","item3 & item4 item5\"",0.2,30,45
        229,a4,"\"double","item5 & item6 item7\"",0.3,40,55

我已经更新了我的table

create table testing(id int, name string, desc string, uqc double, roll int, age int) 
    ROW   FORMAT SERDE 
    'org.apache.hadoop.hive.serde2.OpenCSVSerde'
     WITH SERDEPROPERTIES (
    "separatorChar" = ",",
    "quoteChar" = '"',
    "escapeChar" = "\\" ) STORED AS textfile;

但我还是从另一列得到了数据，。
我在路径命令中使用加载数据。

hadoop Hive create-table hiveql hiveddl

来源：https://stackoverflow.com/questions/65574656/how-to-handle-delimiter-in-column-value

1条答案

按热度按时间

rjzwgtxy1#

这是如何基于regexserde创建表。
每列应有相应的捕获组 () 在正则表达式中。您可以轻松地调试regex，而无需使用 regex_replace :

select regexp_replace('226,a1,"\"double bars","item1 and item2\"",0.0,10,25',
                      '^(\\d+?),(.*?),"(.*)",([0-9.]*),([0-9]*),([0-9]*).*', --6 groups
                     '$1 $2 $3 $4 $5 $6'); --space delimited fields

结果：

226 a1 "double bars","item1 and item2" 0.0 10 25

如果看起来不错，请创建表：

create external table testing(id int, 
                      name string, 
                      desc string, 
                      uqc double, 
                      roll int, 
                      age int
                     ) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ('input.regex'='^(\\d+?),(.*?),"(.*)",([0-9.]*),([0-9]*),([0-9]*).*')
location ....
TBLPROPERTIES("skip.header.line.count"="1")
;

阅读本文了解更多细节。

赞(0）回复(0）举报 2021-07-15

我来回答

如何处理列值中的分隔符？

1条答案

相关问题

热门标签

最新问答