从databricks执行外部表

u59ebvdq 于 2021-05-29 发布在 Spark

关注(0)|答案(0)|浏览(338)

我正在尝试使用adls gen2中的位置和parquet格式从databricks创建一个外部表。我从以下url获取Parquet数据集：https://github.com/teradata/kylo/tree/master/samples/sample-data
我在databricks中创建了ddl，执行时弹出如下错误：error in sql statement:analysisexception:org.apache.hadoop.hive.ql.metadata.hiveexception:java.lang.unsupportedoperationexception:parquet不支持时间戳。见hive-6384；
我试着用下面的dataframe创建相同的，

import org.apache.spark.sql.parquet

create external table testdb.ptables; 
(
registration_dttm   string,
id          int,
first_name      string,
last_name       string,
email           string,
gender          string,
ip_address      string,
cc          string,
country         string,
birthdate       string,
salary          double,
title           string,    
comments        string
)
USING parquet
OPTIONS(path "/mnt/landing/testTable/person");

我搞错了， <console>:24: error: ')' expected but string literal found. OPTIONS(path "/mnt/landing/testTable/person"); .
当我将数据类型timestamp更改为string时，我可以创建ddl，但是当我使用查询select*from<table\u name>时，我得到以下错误

Error in SQL statement: SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 11, 10.139.64.4, executor 0): com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/landing/testTable/person/userdata1.parquet.

我有5Parquet文件中提到的位置，它与第一个文件错误了。
在这个问题上谁能帮忙。
谢谢

Hive apache-spark databricks azure-databricks

来源：https://stackoverflow.com/questions/62512562/executing-external-table-from-databricks