我用以下代码创建了一个配置单元表:
CREATE TABLE rci_db_inventory.dev_cr_asset_trace_2 ( id STRING, acn STRING, source_max_date BIGINT, col_name STRING, source_value STRING, type STRING, lid STRING, source_id STRING, created_by STRING, created_on STRING, traceable STRING, found STRING ) PARTITIONED BY ( ctl_eid STRING ) STORED AS PARQUET
所以问题是,当我试图从pyspark df写入这个表时,代码如下:
columnar_df.withColumn("found", lit(head_bi_name)).write.format("parquet").mode("append") \
.partitionBy("ctl_eid").saveAsTable('rci_db_inventory.dev_cr_asset_trace_2')
错误:
pyspark.sql.utils.AnalysisException: u"The format of the existing table rci_db_inventory.dev_cr_asset_trace_2 is `HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`.;"
我使用cloudera内部集群。
1条答案
按热度按时间pgpifvop1#
如果您尝试使用.format('hive')呢?