1252)导出到配置单元时出现不可打印的字符问题

esyap4oy  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(242)

我正在尝试使用opencsvserde在下面的csv数据集上创建一个配置单元表

WITH SERDEPROPERTIES ("quoteChar"='\"', "separatorChar"=',')

但是Hive的table正在失去 £ 符号,并显示替换字符 .

FWID,GENDER,Ethnicity,AgeAtPeriodEnd,RC_UnitCost,QUANTITY,ElemTypeDesc
2100001,F,White,WEEK,"£2,027.07",3455,AA - Community Meals
2100011,F,White,YEAR,"£75.00,488776",AA - Community Meals
2100044,M,White,WEEK,"£5.40,39.0",123,Ld-ExtDc - Day
2100044,M,White,WEEK,£5.40,9856,FF - Community Meals
2100044,M,White,WEEK,£5.40,"789,193",FF - Community Meals
2100044,M,White,WEEK,£5.40,"876,241",FE - Community Meals
2100044,M,White,WEEK,£5.40,3888,"Community Meals,ExtDc - Day"
2100044,M,White,WEEK,£5.40,235,Ld-ExtDc - Day
2100044,M,White,WEEK,£5.40,8789,FE - Community Meals
2100044,M,White,WEEK,"£10.07,027.7",16478,FE - Community Meals
2100051,F,White,WEEK,£470.00,12375,RG - Community Meals

此外,我还尝试使用lazysimpleserde创建表

WITH SERDEPROPERTIES ( 'escape.delim'='\"', 'field.delim'=',', 'line.delim'='\n', 'serialization.encoding'='windows-1252')

在本例中,使用 £ 但由于缺少符号,值的对齐不起作用 quotechar 作为 \" .
请提出处理这个问题的方法。

ht4b089n

ht4b089n1#

这里有一种方法:
在notepad++中打开csv文件,将编码转换为utf-8,并将文件推送到hdfs。
创建具有以下属性的外部表。

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH SERDEPROPERTIES (
      'field.delim'=',',
      'line.delim'='\n',
      'serialization.format'=',',
      'serialization.encoding'='UTF-8')
    STORED AS INPUTFORMAT
      'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    TBLPROPERTIES("skip.header.line.count"="1")

相关问题