使用Python迭代CSV文件并对每行进行API调用的有效方法

3ks5zfa0  于 2022-12-06  发布在  Python
关注(0)|答案(1)|浏览(107)

我创建了一个脚本,它可以读取csv文件,并在每行上触发一个API调用。它工作正常,但我担心的是,如果文件超过100万行,我是否会遇到内存问题。

import json
import requests
import csv
import time

"""

PURPOSE:
    This is a script designed to: 
        1. read through a CSV file 
        2. loop through each row of the CSV file
        3. build and trigger an API Request to the registerDeviceToken endpoint
        using the contents of each row of the CSV File
INSTRUCTIONS:
    1. Create a CSV file with columns in the following order (left to right):
        1. email
        2. applicationName (i.e. your bundle ID or package name)
        3. platform (i.e. APNS, APNS_SANDBOX, GCM)
        4. token (i.e. device token)
    2. Save CSV file and make note of the full 'filepath'
    3. Define the required constant variables below
    4. Run python script

Note: If your CSV files does not contain column headings, then set 
contains_headers to 'False'        
"""

# Define constant variables
api_endpoint = '<Insert API Endpoint>'

# Update per user specifications
file_location = '/Users/bob/Development/Python/token.csv' # Add location of CSV File
api_key = '<insert API Key: Type server-side' # Add your API Key
contains_headers = True # Set to True is file contains column headers

def main():
    # Open and read CSV File
    with open(r'%s' % (file_location)) as x:
        reader = csv.reader(x)
        if contains_headers == True:
            next(reader)  # Skip the first row if file contains column headers
        counter = 0 # This counter is used to monitor Rate Limit
        
        # Loop through each 
        for row in reader:
            
            jsonBody = {}
            device = {}
            # Create JSON body for API Request
            device['applicationName'] = row[1]
            device['platform'] = row[2]
            device['token'] = row[3]
            device['dataFields'] = {'endpointEnabled': True}
            jsonBody['email'] = row[0]
            jsonBody['device'] = device
            
            # Create API Request
            destinationHeaders = {
                'api_key': api_key,
                'Content-Type': 'application/json'
            }
            r = requests.post(api_endpoint, headers=destinationHeaders, json=jsonBody)
            print(r)
            data = json.loads(r.text)
            
            # Print Successes/Errors to console 
            msg = 'user %s token %s' % (row[0],row[3])
            if r.status_code == 200:
                try:
                    msg = 'Success - %s. %s' % (msg, data['msg'])
                except Exception:
                    continue
            else:
                msg = 'Failure - %s. Code: %s, Details: %s' % (msg, r.status_code, data['msg'])
            print(msg)
            # Add delay to avoid rate limit
            counter = counter + 1
            if counter == 400:
                time.sleep(2)
                counter = 0

if __name__ == '__main__':
    main()

我读过关于使用Pandas和分块的文章,但是使用Dataframe对我来说并不直观,而且我也不知道如何像上面的例子那样解析每一行的数据块。
1.如果文件超过100万行,我目前的文件会遇到内存问题吗?如果有帮助的话,每个CSV应该只有4列。

  1. Pandas分块会更有效吗?如果是的话,我怎样才能像上面的例子一样遍历“csv块”的每一行来构建我的API请求呢?
    在我尝试将文件分块的可悲尝试中,在以下代码中打印'row'的结果是:
for chunk in pd.read_csv(file_location, chunksize=chunk_size):
        for row in chunk:
            print(row)

返回

email
device
applicationName
platform
token

所以我超级困惑。提前感谢你的帮助。

rbpvctlc

rbpvctlc1#

看看python生成器,生成器是一种迭代器,它不会将所有的值存储在内存中

def read_file_generator(file_name):
    with open(file_name) as csv_file:
        for row in csv_file:
            yield row

def main():
    for row in read_file_generator("my_file.csv"):
        print(row)

if __name__ == '__main__':
    main()
``

相关问题