在基于s3中的Parquet文件在athena中创建外部表时如何重命名列？

ndasle7k 于 2021-06-24 发布在 Hive

关注(0)|答案(0)|浏览(182)

有人知道在基于s3中的Parquet文件在athena中创建外部表时如何重命名列吗？
我试图加载的Parquet文件在s3结构中既有一个名为export\u date的列，也有一个export\u date分区。
文件路径示例如下： 's3://bucket_x/path/to/data/export_date=2020-08-01/platform=platform_a' ```
CREATE EXTERNAL TABLE user_john_doe.new_table(
column_1 string,
export_date DATE,
column_3 DATE,
column_4 bigint,
column_5 string)
PARTITIONED BY (
export_date string,
platform string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
LOCATION
's3://bucket_x/path/to/data'
TBLPROPERTIES (
'parquet.compression'='GZIP')
;

所以我想做的是，将export\u date列重命名为export\u date\u exp
要使parquet按索引读取（这将允许您重命名列），必须创建parquet.column.index.access serde属性设置为true的表。
https://docs.amazonaws.cn/en_us/athena/latest/ug/handling-schema-updates-chapter.html#parquet-按名字读
但是下面的代码不会在export\u date\u exp列中加载任何数据：

CREATE EXTERNAL TABLE user_john_doe.new_table(
column_1 string,
export_date_exp DATE,
column_3 DATE,
column_4 bigint,
column_5 string)
PARTITIONED BY (
export_date string,
platform string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES ( 'parquet.column.index.access'='true')
LOCATION
's3://bucket_x/path/to/data'
TBLPROPERTIES (
'parquet.compression'='GZIP')
;

已提出此问题，但未得到答复：
如何用parquet文件源重命名aws雅典娜列？
我再问一次，因为文件明确指出这是可能的。
作为旁注：在我的特定用例中，我不能加载export\u date列，因为我了解到按名称读取parquet并不需要加载每个列。在我的例子中，我不需要export\u date列，因此这避免了与分区名称的冲突。

Hive parquet amazon-athena

来源：https://stackoverflow.com/questions/64381342/how-to-rename-a-column-when-creating-an-external-table-in-athena-based-on-parque