hive通过位置而不是列名访问orc外部表

qnzebej0 于 2021-06-24 发布在 Hive

关注(0)|答案(0)|浏览(220)

我在用 Hive 3.1.0 群集打开 HDInsights 4.0 .
具有相同数据的orc和parquet是使用spark和schema创建的 (a string, b int, c string) .
创建存储为orc位置的外部表a\u st\u b\u int \u d\u st\u orc（a string，b int，d string） <path_to_spark_created_files> 选择*自 a_st_b_int_d_st_orc ;

+----+----+------+
| a  | b  |  d   |
+----+----+------+
| 1  | 2  | abc  |
| 2  | 3  | bcd  |
+----+----+------+

创建外部表a\u st\u b\u int \u d\u st\u parquet（a string，b int，d string）存储为parquet location <path_to_spark_created_files> 选择*自 a_st_b_int_d_st_parquet ;

+----+----+-------+
| a  | b  |   d   |
+----+----+-------+
| 1  | 2  | NULL  |
| 2  | 3  | NULL  |
+----+----+-------+

The default behavior of hive native ORC-Reader is that it maps meta-store column names by position with orc files.    
There were JIRAs created to map columns by name and reverted as well.

可以使用配置行为wrt parquet parquet.column.index.access 尽管默认值是按名称的列分辨率。
在presto中，我们还可以指定 hive.orc.use-column-names=true 如何关闭这个默认的兽人行为在Hive？

Hive presto orc

来源：https://stackoverflow.com/questions/61753710/hive-accesses-orc-external-table-by-position-rather-than-column-name