在配置单元中写入Parquet地板表时出现pyspark错误

ffvjumwh  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(334)

我用以下代码创建了一个配置单元表:

CREATE TABLE rci_db_inventory.dev_cr_asset_trace_2 (   id STRING,   acn STRING,   source_max_date BIGINT,   col_name STRING,   source_value STRING,   type STRING,   lid STRING,   source_id STRING,   created_by STRING,   created_on STRING,   traceable STRING,   found STRING ) PARTITIONED BY (   ctl_eid STRING )  STORED AS PARQUET

所以问题是,当我试图从pyspark df写入这个表时,代码如下:

columnar_df.withColumn("found", lit(head_bi_name)).write.format("parquet").mode("append") \
                .partitionBy("ctl_eid").saveAsTable('rci_db_inventory.dev_cr_asset_trace_2')

错误:

pyspark.sql.utils.AnalysisException: u"The format of the existing table rci_db_inventory.dev_cr_asset_trace_2 is `HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`.;"

我使用cloudera内部集群。

pgpifvop

pgpifvop1#

如果您尝试使用.format('hive')呢?

columnar_df.withColumn("found", lit(head_bi_name)).write.format("hive").mode("append") \
               .partitionBy("ctl_eid").saveAsTable('rci_db_inventory.dev_cr_asset_trace_2')

相关问题