python—提高代码的速度,使许多api调用并将所有数据存储到一个 Dataframe 中

ig9co6j1  于 2021-08-25  发布在  Java
关注(0)|答案(1)|浏览(302)

我编写了一个代码,它获取标识符编号,向特定的api发出请求,并返回与该标识符编号相关的数据。代码在dataframe中运行,获取标识符编号(大约600),返回相关信息并转换为dataframe。最后,将所有 Dataframe 连接到一个 Dataframe 中。代码运行速度非常慢。有没有办法让它更快。我对python没有信心,如果您能分享解决方案代码,我将不胜感激。代码:

file = dataframe

list_df = []

for index,row in file.iterrows():

    url = "https://some_api/?eni_number="+str(row['id'])+"&apikey=some_apikey"

    payload={}
    headers = {
      'Accept': 'application/json'
    }

    response = requests.request("GET", url, headers=headers, data=payload)

    a = json.loads(response.text)

    df_index = json_normalize(a, 'vessels')
    df_index['eni_number'] = row['id']

    list_df.append(df_index)

    #print(df_index)

total = pd.concat(list_df)
qyswt5oh

qyswt5oh1#

这里的瓶颈似乎是http请求一个接一个地同步执行。因此,大部分时间都浪费在等待服务器的响应上。
使用异步方法可能会获得更好的结果,例如使用grequests并行执行所有http请求:

import grequests

ids = dataframe["id"].to_list()
urls = [f"https://some_api/?eni_number={id}&apikey=some_apikey" for id in ids]

payload = {}
headers = {'Accept': 'application/json'}
requests = (grequests.get(url, headers=headers, data=payload) for url in urls)

responses = grequests.map(requests)  # execute all requests in parallel

list_df = []
for id, response in zip(ids, responses):
    a = json.loads(response.text)
    df_index = json_normalize(a, 'vessels')
    df_index['eni_number'] = id

    list_df.append(df_index)

total = pd.concat(list_df)

相关问题