如何在scala中更新spark dataframe列中的值后重新传递该列的数据类型

ivqmmu1c 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(555)

这是我的原始Dataframe的一个示例 df ```
+----+
| mix|
+----+
| 1|
| 2|
| cap|
| 3|
| 53|
| 56|
| 98|
| 90|
+----+

列的当前数据类型为 `StringType` 替换值后 `cap` 与 `0` ，可能有两种情况
列没有更多的字符串值，因此所有值现在都是数字
列具有其他字符串值，因此它将保留 `StringType` 如何再次推断数据类型，以便知道替换后的列是否为纯数字 `Numerical` ，确切的数据类型是什么 `Integer` ,  `Float` , `Double` ```
df.withColumn("mix",when(col("mix") === "cap",0).otherwise(col("mix")))

scala DataFrame apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/63945860/how-to-reinfer-datatype-of-a-spark-dataframe-column-after-updating-values-in-the

1条答案

按热度按时间

dwbf0jvd1#

val mix = Seq("1","2","cap","4").toDF("mix")    
mix.printSchema()
root
|-- mix: string (nullable = true)

val after = mix.withColumn("mix",when(col("mix") === "cap",0).otherwise(col("mix")).cast(IntegerType))
after.printSchema()
root
|-- mix: integer (nullable = true)

如果这对你有帮助，请告诉我。

赞(0）回复(0）举报 2021-05-27

我来回答

如何在scala中更新spark dataframe列中的值后重新传递该列的数据类型

1条答案

相关问题

热门标签

最新问答