如何使用Pandas规范化嵌套JSON -嵌套字典和列表扁平化

iyzzxitl  于 2023-05-19  发布在  其他
关注(0)|答案(1)|浏览(151)

我想从API中扁平化JSON请求。我一直在尝试用Pandas来实现这一点。
下面是我正在使用的JSON示例:

response = 
{
    "data": {
        "getTestExecutions": {
            "total": 50,
            "start": 1,
            "limit": 100,
            "results": [
                {
                    "issueId": "12345",
                    "jira": {
                        "key": "test-1234",
                        "summary": "test- JSON",
                        "project": {
                            "self": "some value",
                            "id": "4321",
                            "key": "tst",
                            "name": "test",
                            "projectTypeKey": "software",
                            "simplified": false,
                            "avatarUrls": {
                                "key" : "value",
                                "key1" : "value1",
                                "key2" : "value1",
                                "key4" : "value4"
                            }
                        }
                    },
                    "tests": {
                        "total": 25,
                        "start": 0,
                        "limit": 100,
                        "results": [
                            {
                                "issueId": "1234",
                                "testType": {
                                    "name": "Manual"
                                },
                                "jira": {
                                    "key": "example-123",
                                    "summary": "sample example"
                                }
                            },
                            {
                                "issueId": "12345",
                                "testType": {
                                    "name": "Manual"
                                },
                                "jira": {
                                    "key": "example-124",
                                    "summary": "sample example 1"
                                }
                            }

                        ]
                    },
                    "testRuns": {
                        "results": [
                            {
                                "status": {
                                    "name": "PASSED"
                                },
                                "executedById": "user id",
                                "startedOn": "date here",
                                "finishedOn": "date here",
                                "assigneeId": "assigneeId"
                            },
                            {
                                "status": {
                                    "name": "PASSED"
                                },
                                "executedById": "executedBy id",
                                "startedOn": "start date",
                                "finishedOn": "end date",
                                "assigneeId": "assigneeId"
                            }

                        ]
                    }
                }
                
            ]
        }
    }
}

这是我目前为止的代码:

df = pd.json_normalize(d['data']['getTestExecutions']['results'])
    df_test_results = df.explode('tests.results')
    df_test_results_Norm = pd.json_normalize(df_test_results['tests.results']).add_prefix('test.')
    df_testRun_results = df.explode('testRuns.results')
    df_testRun_results_Norm = pd.json_normalize(df_testRun_results['testRuns.results']).add_prefix('testRuns.')
    df = pd.concat([df.drop(['tests.results','testRuns.results'],axis=1),df_test_results_Norm, df_testRun_results_Norm],axis=1)

我遇到的问题是,我希望每一列都是字典的键(不丢失任何键),同时保持顺序。我希望一个单一的 Dataframe ,父记录重复每个子记录。我的意思如下:

  • TestExecutions是父项,tests/testRuns是子项
  • 对于每个测试执行,我可以有100个测试和100个testRuns。我希望dataframe中的父记录应用于每个子记录

现在,我的代码生成一个空白值为N/A的 Dataframe ,因为我正在连接不同大小的结果。
有谁知道最好的方法是什么?谢谢你的帮助

vulvrdjw

vulvrdjw1#

验证码:
解析 meta和children,然后用reduce合并所有3个。

import pandas as pd
from functools import reduce

data = response.get("data").get("getTestExecutions")
meta = ["total", "start", "limit"]

jira = pd.json_normalize(
    data=data,
    meta=meta,
    record_path="results"
).drop(columns=["tests.results", "testRuns.results"])

tests = pd.json_normalize(
    data=data,
    meta=meta,
    record_path=["results", ["tests", "results"]]
)

runs = pd.json_normalize(
    data=data,
    meta=meta,
    record_path=["results", ["testRuns", "results"]]
)

dfs = [jira, tests, runs]
df_final = reduce(lambda left, right: pd.merge(
    left=left,
    right=right,
    on=["total", "start", "limit"],
    suffixes=("_meta", "_tests")
), dfs)
df_final.columns = df_final.columns.str.split(pat=".", n=3).str[-1]
print(df_final)

输出:

issueId_meta   key_meta summary_meta        self    id  key  name projectTypeKey  simplified    key    key1    key2    key4 total start limit total start limit issueId_tests    name    key_tests     summary_tests   executedById   startedOn finishedOn  assigneeId    name
0        12345  test-1234   test- JSON  some value  4321  tst  test       software       False  value  value1  value1  value4    25     0   100    50     1   100          1234  Manual  example-123    sample example        user id   date here  date here  assigneeId  PASSED
1        12345  test-1234   test- JSON  some value  4321  tst  test       software       False  value  value1  value1  value4    25     0   100    50     1   100          1234  Manual  example-123    sample example  executedBy id  start date   end date  assigneeId  PASSED
2        12345  test-1234   test- JSON  some value  4321  tst  test       software       False  value  value1  value1  value4    25     0   100    50     1   100         12345  Manual  example-124  sample example 1        user id   date here  date here  assigneeId  PASSED
3        12345  test-1234   test- JSON  some value  4321  tst  test       software       False  value  value1  value1  value4    25     0   100    50     1   100         12345  Manual  example-124  sample example 1  executedBy id  start date   end date  assigneeId  PASSED

相关问题