pyspark:只为某些字段读取定义了固定类型的json文件

kcwpcxri  于 2021-05-16  发布在  Spark
关注(0)|答案(0)|浏览(448)

我有一个要求,json文件的某些字段必须是struct类型,这样我的转换才能工作,否则我会得到一个错误- Can't extract value from <field>; need struct type but got string spark的隐式模式在这里没有用处,因为在整个json文件中,相关字段的值可以为null,因此spark假设它是字符串类型。我不想在这里定义自己的模式,因为这里有大量的字段(>1000),并且只有少数字段需要强制输入类型。
我试图在读取文件后强制转换字段,但出现了一个错误 due to data type mismatch: cannot cast string to struct<> 编辑:您可以参考这个在pyspark udf中使用structtype列的例子,以了解有关我正在处理的转换的更多详细信息。
一个非常高级的模式,

- root:
  - a_time:
    - time_to_resolution_remainingTime
     ....
  - b_time:
    - time_to_resolution_remainingTime
    ....

在我的用例中 a_time 字段完全为空,导致Spark将其指定为字符串类型。

...
{
   "name":"time_to_resolution",
      "nullable":true,
      "type":{
        "fields":[
          {
            "metadata":{

            },
            "name":"_links",
            "nullable":true,
            "type":{
              "fields":[
                {
                  "metadata":{

                  },
                  "name":"self",
                  "nullable":true,
                  "type":"string"
                }
              ],
              "type":"struct"
            }
          },
          {
            "metadata":{

            },
            "name":"completedCycles",
            "nullable":true,
            "type":{
              "containsNull":true,
              "elementType":{
                "fields":[
                  {
                    "metadata":{

                    },
                    "name":"breached",
                    "nullable":true,
                    "type":"boolean"
                  },
                  {
                    "metadata":{

                    },
                    "name":"elapsedTime",
                    "nullable":true,
                    "type":{
                      "fields":[
                        {
                          "metadata":{

                          },
                          "name":"friendly",
                          "nullable":true,
                          "type":"string"
                        },
                        {
                          "metadata":{

                          },
                          "name":"millis",
                          "nullable":true,
                          "type":"long"
                        }
                      ],
                      "type":"struct"
                    }
                  },
                  {
                    "metadata":{

                    },
                    "name":"goalDuration",
                    "nullable":true,
                    "type":{
                      "fields":[
                        {
                          "metadata":{

                          },
                          "name":"friendly",
                          "nullable":true,
                          "type":"string"
                        },
                        {
                          "metadata":{

                          },
                          "name":"millis",
                          "nullable":true,
                          "type":"long"
                        }
                      ],
                      "type":"struct"
                    }
                  },
                  {
                    "metadata":{

                    },
                    "name":"remainingTime",
                    "nullable":true,
                    "type":{
                      "fields":[
                        {
                          "metadata":{

                          },
                          "name":"friendly",
                          "nullable":true,
                          "type":"string"
                        },
                        {
                          "metadata":{

                          },
                          "name":"millis",
                          "nullable":true,
                          "type":"long"
                        }
                      ],
                      "type":"struct"
                    }
                  },
                  {
                    "metadata":{

                    },
                    "name":"startTime",
                    "nullable":true,
                    "type":{
                      "fields":[
                        {
                          "metadata":{

                          },
                          "name":"epochMillis",
                          "nullable":true,
                          "type":"long"
                        },
                        {
                          "metadata":{

                          },
                          "name":"friendly",
                          "nullable":true,
                          "type":"string"
                        },
                        {
                          "metadata":{

                          },
                          "name":"iso8601",
                          "nullable":true,
                          "type":"string"
                        },
                        {
                          "metadata":{

                          },
                          "name":"jira",
                          "nullable":true,
                          "type":"string"
                        }
                      ],
                      "type":"struct"
                    }
                  },
                  {
                    "metadata":{

                    },
                    "name":"stopTime",
                    "nullable":true,
                    "type":{
                      "fields":[
                        {
                          "metadata":{

                          },
                          "name":"epochMillis",
                          "nullable":true,
                          "type":"long"
                        },
                        {
                          "metadata":{

                          },
                          "name":"friendly",
                          "nullable":true,
                          "type":"string"
                        },
                        {
                          "metadata":{

                          },
                          "name":"iso8601",
                          "nullable":true,
                          "type":"string"
                        },
                        {
                          "metadata":{

                          },
                          "name":"jira",
                          "nullable":true,
                          "type":"string"
                        }
                      ],
                      "type":"struct"
                    }
                  }
                ],
                "type":"struct"
              },
              "type":"array"
            }
          }

另一个具有相同结构但由于所有空值而被指定为字符串的字段。

{
      "metadata":{

      },
      "name":"time_to_resolution_b",
      "nullable":true,
      "type":"string"
}

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题