我在用 Hive 3.1.0
群集打开 HDInsights 4.0
.
具有相同数据的orc和parquet是使用spark和schema创建的 (a string, b int, c string)
.
创建存储为orc位置的外部表a\u st\u b\u int \u d\u st\u orc(a string,b int,d string) <path_to_spark_created_files>
选择*自 a_st_b_int_d_st_orc
;
+----+----+------+
| a | b | d |
+----+----+------+
| 1 | 2 | abc |
| 2 | 3 | bcd |
+----+----+------+
创建外部表a\u st\u b\u int \u d\u st\u parquet(a string,b int,d string)存储为parquet location <path_to_spark_created_files>
选择*自 a_st_b_int_d_st_parquet
;
+----+----+-------+
| a | b | d |
+----+----+-------+
| 1 | 2 | NULL |
| 2 | 3 | NULL |
+----+----+-------+
The default behavior of hive native ORC-Reader is that it maps meta-store column names by position with orc files.
There were JIRAs created to map columns by name and reverted as well.
可以使用配置行为wrt parquet parquet.column.index.access
尽管默认值是按名称的列分辨率。
在presto中,我们还可以指定 hive.orc.use-column-names=true
如何关闭这个默认的兽人行为在Hive?
暂无答案!
目前还没有任何答案,快来回答吧!