hive等价于spark向量在表格创建中的应用

vecaoik1 于 2021-06-26 发布在 Hive

关注(0)|答案(1)|浏览(401)

我有Spark DataFrame 其中一列为 Vector 类型。当我在上面创建一个配置单元表时，我不知道它等效于哪种类型

CREATE EXTERNAL TABLE mix (
        topicdist ARRAY<DOUBLE>
    )
STORED AS PARQUET
LOCATION 's3://path/to/file.parquet'

表的创建似乎工作正常，但当我尝试

select topicdist from mix limit 1

我得到的错误是：

Failed with exception java.io.IOException:java.lang.RuntimeException: Unknown hive type info array<double> when searching for field type

Hive apache-spark apache-spark-sql parquet apache-spark-ml

来源：https://stackoverflow.com/questions/44097017/hive-equivalent-to-spark-vector-on-table-creation

1条答案

按热度按时间

dfuffjeb1#

Vector 是spark用户定义的类型，它在内部存储为

StructType(Seq(
  StructField("type", ShortType, true), 
  StructField("size",IntegerType, true),
  StructField("indices", ArrayType(IntegerType, true), true),
  StructField("values",ArrayType(DoubleType, true), true)
))

所以你需要：

CREATE EXTERNAL TABLE mix (
  topicdist struct<type:tinyint,size:int,indices:array<int>,values:array<double>>
)
STORED AS PARQUET
LOCATION 's3://path/to/file.parquet'

请记住，结果列不会被解释为Spark Vector .

赞(0）回复(0）举报 2021-06-26

我来回答

hive等价于spark向量在表格创建中的应用

1条答案

相关问题

热门标签

最新问答