用于配置单元serde中特定分隔符字符串的regex

ncecgwcz 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(340)

我使用serde读取带有分隔符的特定格式的数据|
我的一行数据可能看起来像：key1=value2 | key2=value2 | key3=“va，lues”，我创建了如下配置单元表：

CREATE EXTERNAL TABLE(
field1 STRING,
field2 STRING,
field3 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^\\|]*)\\|([^\\|]*)\\|([^\\|]*)",
  "output.format.string" = "%1$s %2$s %3$s"
)
STORED AS TEXTFILE;

我需要提取所有值，忽略所有配额（如果存在）。结果看起来像是

value2  value2 va , lues

如何更改extractig值的当前regexp？

hadoop Hive regex hive-serde

来源：https://stackoverflow.com/questions/43787756/regex-for-specific-delimiter-string-in-hive-serde

1条答案

按热度按时间

gwo2fgha1#

我目前可以提供2个选择，没有一个是完美的。
顺便说一句， "output.format.string" 已过时且无效。

1

create external table mytable
(
    q1          string    
   ,field1      string
   ,q2          string
   ,field2      string
   ,q3          string
   ,field3      string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ('input.regex' = '.*?=(?<q1>"?)(.*?)(?:\\k<q1>)\\|.*?=(?<q2>"?)(.*?)(?:\\k<q2>)\\|.*?=(?<q3>"?)(.*?)(?:\\k<q3>)')
stored as textfile
;

select * from mytable
;

+----+--------+----+--------+----+-----------+
| q1 | field1 | q2 | field2 | q3 |  field3   |
+----+--------+----+--------+----+-----------+
|    | value2 |    | value2 | "  | va , lues |
+----+--------+----+--------+----+-----------+

2

create external table mytable
(
    field1 string
   ,field2 string
   ,field3 string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ('input.regex' = '.*?=(".*?"|.*?)\\|.*?=(".*?"|.*?)\\|.*?=(".*?"|.*?)')
stored as textfile
;

select * from mytable
;

+--------+--------+-------------+
| field1 | field2 |   field3    |
+--------+--------+-------------+
| value2 | value2 | "va , lues" |
+--------+--------+-------------+

赞(0）回复(0）举报 2021-05-29

我来回答

用于配置单元serde中特定分隔符字符串的regex

1条答案

1

2

相关问题

热门标签

最新问答