创建json类型的模式并在scala中使用spark读取它[错误：无法解析jsontostructs]

q43xntqr 于 2021-07-09 发布在 Spark

关注(0)|答案(2)|浏览(252)

我有一个json文件，如下所示：

{"Codes":[{"CName":"012","CValue":"XYZ1234","CLevel":"0","msg":"","CType":"event"},{"CName":"013","CValue":"ABC1234","CLevel":"1","msg":"","CType":"event"}}

我想为这个创建一个模式，如果json文件是空的( {} )它应该是一个空字符串。
但是，当我使用 df.show :

[[012, XYZ1234, 0, event, ], [013, ABC1234, 1, event, ]]

我创建了如下模式：

val schemaF = ArrayType(
  StructType(
    Array(
      StructField("CName", StringType),
      StructField("CValue", StringType),
      StructField("CLevel", StringType),
      StructField("msg", StringType),
      StructField("CType", StringType)
    )
  )
)

当我在下面试的时候，

val df1 = df.withColumn("Codes",from_json('Codes, schemaF))

它给出了一个例外：
org.apache.spark.sql.analysisexception:无法解析“jsontostructs”( Codes )'由于数据类型不匹配：参数1需要字符串类型，但是，' Codes '是数组的structcname:string,cvalue:string,clevel:string,ctype:string,msg:string类型；'项目[valid#51，jsontostructs（arraytype（structtype（cname，stringtype，true），structfield（cvalue，stringtype，true），structfield（clevel，stringtype，true），structfield（msg，stringtype，true），structfield（ctype，stringtype，true）），true），代码#8，一些（美国/波哥大））作为错误代码#77]
有人能告诉我为什么以及如何解决这个问题吗？

scala DataFrame apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/66751595/creating-schema-of-json-type-and-reading-it-using-spark-in-scala-error-cannot

2条答案

按热度按时间

ymzxtsji1#

您的模式与您尝试读取的json文件不对应。它不见了 Codes 对于数组类型，它应该如下所示：

val schema = StructType(
  Array(
    StructField(
      "Codes",
      ArrayType(
        StructType(
          Array(
            StructField("CLevel", StringType, true),
            StructField("CName", StringType, true),
            StructField("CType", StringType, true),
            StructField("CValue", StringType, true),
            StructField("msg", StringType, true)
          )
        ), true)
      ,true)
  )
)

在读取json时要应用它，而不是 from_json 功能：

val df = spark.read.schema(schema).json("path/to/json/file")

df.printSchema
//root
// |-- Codes: array (nullable = true)
// |    |-- element: struct (containsNull = true)
// |    |    |-- CLevel: string (nullable = true)
// |    |    |-- CName: string (nullable = true)
// |    |    |-- CType: string (nullable = true)
// |    |    |-- CValue: string (nullable = true)
// |    |    |-- msg: string (nullable = true)

编辑：
对于注解问题，可以使用以下架构定义：

val schema = StructType(
    Array(
      StructField(
        "Codes",
        ArrayType(
          StructType(
            Array(
              StructField("CLevel", StringType, true),
              StructField("CName", StringType, true),
              StructField("CType", StringType, true),
              StructField("CValue", StringType, true),
              StructField("msg", StringType, true)
            )
          ), true)
        ,true),
      StructField("lid", StructType(Array(StructField("idNo", StringType, true))), true)
    )
  )

赞(0）回复(0）举报 2021-07-09

iklwldmw2#

val schema =
      StructType(
        Array(
          StructField("CName", StringType),
          StructField("CValue", StringType),
          StructField("CLevel", StringType),
          StructField("msg", StringType),
          StructField("CType", StringType)
        )

      )
    val df0 = spark.read.schema(schema).json("/path/to/data.json")

赞(0）回复(0）举报 2021-07-09

我来回答

创建json类型的模式并在scala中使用spark读取它[错误：无法解析jsontostructs]

2条答案

相关问题

热门标签

最新问答