hive外部表指向带有双引号的csv文件

1yjd4xko  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(588)

我正在尝试创建指向csv文件的外部配置单元表。我的csv文件有一个列(col2),列值可以包含双引号和逗号。
每列数据:

Col1 : 150
Col2 : BATWING, ABC "D " TEST DATA
Col3 : 300

csv中的行:

150,"BATWING, ABC ""D "" TEST DATA",300

创建表ddl:

CREATE EXTERNAL TABLE test (
   col1 INT, 
   col2 STRING, 
   col3 INT) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
ESCAPED BY '"' 
LOCATION 's3://test-folder/test-file.csv'

当我查询表时,我在col3中看到空值。
创建表时,我在这里缺少什么?感谢您的帮助

juud5qan

juud5qan1#

Create hive external table:

DROP TABLE IF EXISTS ${hiveconf:dbnm}.tblnm ;
CREATE EXTERNAL TABLE ${hiveconf:dbnm}.tblnm (
C1 string,
C2 string
)
PARTITIONED BY (C3 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES (
    "separatorChar" = '|'  (change it to your separator)
   ,"quoteChar"     = '\"'
) 
STORED AS TEXTFILE 
LOCATION '/hdfspath' 
--tblproperties ("skip.header.line.count"="1")
;
MSCK REPAIR TABLE ${hiveconf:dbnm}.tblnm;
ep6jt1vc

ep6jt1vc2#

使用 OpenCSVSerde . 下面是一个例子
创建表

CREATE TABLE bala (col1 int, col2 string, col3 int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES(
  "separatorChar" = ",", "escapeChar"='\"'
);

加载数据

hive>LOAD DATA INPATH '/../test.csv' INTO TABLE bala
Loading data to table bala
Table testing.bala stats: [numFiles=1, totalSize=40]
OK
Time taken: 0.514 seconds

检查是否已加载

hive> select * from bala;
OK
150 BATWING, ABC "D " TEST DATA 300
Time taken: 0.288 seconds, Fetched: 1 row(s)

相关问题