spark dataset write()方法返回错误

wb1gzix0  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(584)

我正在尝试使用databricks库加载xml文件并将数据写入该文件,但无法写入输出 data(array<string>) 到csv文件。
我得到以下错误:

Exception in thread "main" java.lang.UnsupportedOperationException: CSV data source does not support array<string> data type.

当我打印数据集时,它会这样打印:

+--------------------+
|             orgname|
+--------------------+
|[Muncy, Geissler,...|
|[Muncy, Geissler,...|
|[Knobbe Martens O...|
|[null, Telekta La...|
|[McAndrews, Held ...|
|[Notaro, Michalos...|
|                null|
|[Cowan, Liebowitz...|
|                null|
|[Kunzler Law Grou...|
|[null, null, Klei...|
|[Knobbe, Martens,...|
|[Merchant & Gould...|
|                null|
|[Culhane Meadows ...|
|[Culhane Meadows ...|
|[Vista IP Law Gro...|
|[Thompson & Knigh...|
|  [Fish & Tsang LLP]|
|                null|
+--------------------+
wlsrxk51

wlsrxk511#

例外情况应该是不言自明的。不能将数组写入 CSV 文件。
必须将其串联成单个字符串:

import org.apache.spark.sql.functions.concat_ws

val separator: String = ";"  // Choose appropriate one in your case

df.withColumn("orgname", concat_ws(separator, $"orgname")).write.csv(...)

相关问题