使用嵌套的json数据pyspark读入spark df中的json文件

nkkqxpd9 于 2021-05-19 发布在 Spark

关注(0)|答案(1)|浏览(461)

我需要把多个json文件读入spark-df。json数据如下所示：

{"f0_":{"id":"138307057680","ActionName":"Complete","Time":"2020-04-23-12:40:04"}}
{"f0_":{"id":"138313115245","ActionName":"Midpoint","Time":"2020-06-16-20:41:16"}}

我需要去掉第一个包含所有列的键。我试过：

jsonFiles = spark.read.json("Resources") # path to all json files
jsonFile.printSchema()

输出为：

root
 |-- f0_: struct (nullable = true)
 |    |-- id string (nullable = true)
 |    |-- ActionName: string (nullable = true)
 |    |-- Time: string (nullable = true)

JSON apache-spark pyspark

来源：https://stackoverflow.com/questions/64430466/read-in-json-files-in-spark-df-with-nested-json-data-pyspark

1条答案

按热度按时间

c9x0cxw01#

这对你来说是一个有效的解决方案----

在此处创建Dataframe

df_new = spark.createDataFrame([(str({"f0_":{"id":"138307057680","ActionName":"Complete","Time":"2020-04-23-12:40:04"}})), (str({"f0_":{"id":"138313115245","ActionName":"Midpoint","Time":"2020-06-16-20:41:16"}}))],T.StringType())

df_new = df_new.withColumn('col', F.from_json("value",T.MapType(T.StringType(), T.StringType())))
df_new = df_new.select(F.explode("col").alias("x", "y"))

df_new = df_new.withColumn('y', F.from_json("y",T.MapType(T.StringType(), T.StringType())))

df_new = df_new.withColumn("id", df_new.y.getItem("id")).withColumn("ActionName", df_new.y.getItem("ActionName")).withColumn("Time", df_new.y.getItem("Time"))
df_new.show(truncate=False)

此处输出

+---+-------------------------------------------------------------------------+------------+----------+-------------------+
|x  |y                                                                        |id          |ActionName|Time               |
+---+-------------------------------------------------------------------------+------------+----------+-------------------+
|f0_|[id -> 138307057680, ActionName -> Complete, Time -> 2020-04-23-12:40:04]|138307057680|Complete  |2020-04-23-12:40:04|
|f0_|[id -> 138313115245, ActionName -> Midpoint, Time -> 2020-06-16-20:41:16]|138313115245|Midpoint  |2020-06-16-20:41:16|
+---+-------------------------------------------------------------------------+------------+----------+-------------------+

赞(0）回复(0）举报 2021-05-20

我来回答

使用嵌套的json数据pyspark读入spark df中的json文件

1条答案

在此处创建Dataframe

相关问题

热门标签

最新问答