使用Python脚本将嵌套JSON转换为CSV

qgzx9mmu  于 8个月前  发布在  Python
关注(0)|答案(1)|浏览(68)

我对Python很陌生,正在尝试将嵌套的JSON转换为CSV。下面是我正在尝试的Python脚本,但我没有得到想要的输出。

import json
import pandas as pd

# Load via context manager and read_json() method
with open('employee_data1.json', 'r')as file:
    # load JSON data and parse into Dictionary object
    data = json.load(file)
    
# Load JSON as DataFrame 
df = pd.json_normalize(data)

# Print Result
print(df)

# output DataFrame to CSV file
df.to_csv('employee_data.csv')

字符串
我实际上是用上面的代码尝试了2个JSON数据,每个都得到了不同的输出。
employee_data1.json

{
    "features": [
        {
            "candidate": {
                "first_name": "Margaret",
                "last_name": "Mcdonald",
                "skills": [
                    "skLearn",
                    "Java",
                    "R",
                    "SQL",
                    "Spark",
                    "C++"
                ],
                "state": "AL",
                "specialty": "Database",
                "experience": [
                    {
                        "company": "XYZ Corp",
                        "position": "Software Engineer",
                        "start_date": "2016-01-01",
                        "end_date": "2021-03-01"
                    },
                    {
                        "company": "ABC Inc",
                        "position": "Senior Software Engineer",
                        "start_date": "2021-04-01",
                        "end_date": null
                    }
                ],
                "relocation": "no"
            }
        },
        {
            "candidate": {
                "first_name": "Michael",
                "last_name": "Carter",
                "skills": [
                    "TensorFlow",
                    "R",
                    "Spark",
                    "MongoDB",
                    "C++",
                    "SQL"
                ],
                "state": "AR",
                "specialty": "Statistics",
                "experience": [
                    {
                        "company": "DFC Corp",
                        "position": "Software Engineer",
                        "start_date": "2016-01-01",
                        "end_date": "2021-03-01"
                    },
                    {
                        "company": "SDC Inc",
                        "position": "Senior Software Engineer",
                        "start_date": "2021-04-01",
                        "end_date": null
                    }
                ],
                "relocation": "yes"
            }
        }
    ]
}


employee_data2.json

{
    "features": 
      {
        "candidate": {
          "first_name": "Margaret",
          "last_name": "Mcdonald",
          "skills": [
            "skLearn",
            "Java",
            "R",
            "SQL",
            "Spark",
            "C++"
          ],
          "state": "AL",
          "specialty": "Database",
          "experience": [
            {
              "company": "XYZ Corp",
              "position": "Software Engineer",
              "start_date": "2016-01-01",
              "end_date": "2021-03-01"
            },
            {
              "company": "ABC Inc",
              "position": "Senior Software Engineer",
              "start_date": "2021-04-01",
              "end_date": null
            }
          ],
          "relocation": "no"
        }
      }
  }


下面,我只选择了几个领域,而不是所有领域.我期待下面**期望输出.**我会很高兴,如果有人能帮助我在这方面.

candidate.first_name, candidate.last_name, candidate.skills, candidate.state, candidate.experience.company, candidate.experience.position

Margaret, Mcdonald, "['skLearn', 'Java', 'R', 'SQL', 'Spark', 'C++']", AL, XYZ Corp, Software Engineer

abithluo

abithluo1#

你可以像这样使用**json_normalize()**:

df = pd.json_normalize(your_json_data,record_path=['features',["candidate","experience"]],
                       meta=[["features","candidate","first_name"],["features","candidate","last_name"],
                              ["features","candidate","relocation"],["features","candidate","skills"],
                                    ["features","candidate","specialty"],["features","candidate","state"]])

字符串
但它会抛出这个错误:

ValueError: operands could not be broadcast together with shape (12,) (2,)


这可能是一个bug。在github上看看这个问题:BUG: json_normalize fails with empty arrays/lists。为了避免这个错误,你应该将列表转换为字符串,然后使用json_normalize最后将字符串类型的列表转换为列表:

if len(your_json_data["features"]) > 1:
    for i in your_json_data["features"]:
        i["candidate"]["skills"] = str(i["candidate"]["skills"])
else:
    your_json_data["features"]["candidate"]["skills"] = str(your_json_data["features"]["candidate"]["skills"])


json_normalize之后:

df ["features.candidate.skills"] = df["features.candidate.skills"].apply(ast.literal_eval)

退出

|    | company   | position                 | start_date   | end_date   | features.candidate.first_name   | features.candidate.last_name   | features.candidate.relocation   | features.candidate.skills                             | features.candidate.specialty   | features.candidate.state   |
|---:|:----------|:-------------------------|:-------------|:-----------|:--------------------------------|:-------------------------------|:--------------------------------|:------------------------------------------------------|:-------------------------------|:---------------------------|
|  0 | XYZ Corp  | Software Engineer        | 2016-01-01   | 2021-03-01 | Margaret                        | Mcdonald                       | no                              | ['skLearn', 'Java', 'R', 'SQL', 'Spark', 'C++']       | Database                       | AL                         |
|  1 | ABC Inc   | Senior Software Engineer | 2021-04-01   | nan        | Margaret                        | Mcdonald                       | no                              | ['skLearn', 'Java', 'R', 'SQL', 'Spark', 'C++']       | Database                       | AL                         |
|  2 | DFC Corp  | Software Engineer        | 2016-01-01   | 2021-03-01 | Michael                         | Carter                         | yes                             | ['TensorFlow', 'R', 'Spark', 'MongoDB', 'C++', 'SQL'] | Statistics                     | AR                         |
|  3 | SDC Inc   | Senior Software Engineer | 2021-04-01   | nan        | Michael                         | Carter                         | yes                             | ['TensorFlow', 'R', 'Spark', 'MongoDB', 'C++', 'SQL'] | Statistics                     | AR                         |

完整编码

import ast
if len(your_json_data["features"]) > 1:
    for i in your_json_data["features"]:
        i["candidate"]["skills"] = str(i["candidate"]["skills"])
else:
    your_json_data["features"]["candidate"]["skills"] = str(your_json_data["features"]["candidate"]["skills"])

df = pd.json_normalize(your_json_data,record_path=['features',["candidate","experience"]],
                       meta=[["features","candidate","first_name"],["features","candidate","last_name"],
                ["features","candidate","relocation"],["features","candidate","skills"],
                ["features","candidate","specialty"],["features","candidate","state"]])

df["features.candidate.skills"] = df["features.candidate.skills"].apply(ast.literal_eval)

相关问题