我创建了一个脚本,它可以读取csv文件,并在每行上触发一个API调用。它工作正常,但我担心的是,如果文件超过100万行,我是否会遇到内存问题。
import json
import requests
import csv
import time
"""
PURPOSE:
This is a script designed to:
1. read through a CSV file
2. loop through each row of the CSV file
3. build and trigger an API Request to the registerDeviceToken endpoint
using the contents of each row of the CSV File
INSTRUCTIONS:
1. Create a CSV file with columns in the following order (left to right):
1. email
2. applicationName (i.e. your bundle ID or package name)
3. platform (i.e. APNS, APNS_SANDBOX, GCM)
4. token (i.e. device token)
2. Save CSV file and make note of the full 'filepath'
3. Define the required constant variables below
4. Run python script
Note: If your CSV files does not contain column headings, then set
contains_headers to 'False'
"""
# Define constant variables
api_endpoint = '<Insert API Endpoint>'
# Update per user specifications
file_location = '/Users/bob/Development/Python/token.csv' # Add location of CSV File
api_key = '<insert API Key: Type server-side' # Add your API Key
contains_headers = True # Set to True is file contains column headers
def main():
# Open and read CSV File
with open(r'%s' % (file_location)) as x:
reader = csv.reader(x)
if contains_headers == True:
next(reader) # Skip the first row if file contains column headers
counter = 0 # This counter is used to monitor Rate Limit
# Loop through each
for row in reader:
jsonBody = {}
device = {}
# Create JSON body for API Request
device['applicationName'] = row[1]
device['platform'] = row[2]
device['token'] = row[3]
device['dataFields'] = {'endpointEnabled': True}
jsonBody['email'] = row[0]
jsonBody['device'] = device
# Create API Request
destinationHeaders = {
'api_key': api_key,
'Content-Type': 'application/json'
}
r = requests.post(api_endpoint, headers=destinationHeaders, json=jsonBody)
print(r)
data = json.loads(r.text)
# Print Successes/Errors to console
msg = 'user %s token %s' % (row[0],row[3])
if r.status_code == 200:
try:
msg = 'Success - %s. %s' % (msg, data['msg'])
except Exception:
continue
else:
msg = 'Failure - %s. Code: %s, Details: %s' % (msg, r.status_code, data['msg'])
print(msg)
# Add delay to avoid rate limit
counter = counter + 1
if counter == 400:
time.sleep(2)
counter = 0
if __name__ == '__main__':
main()
我读过关于使用Pandas和分块的文章,但是使用Dataframe对我来说并不直观,而且我也不知道如何像上面的例子那样解析每一行的数据块。
1.如果文件超过100万行,我目前的文件会遇到内存问题吗?如果有帮助的话,每个CSV应该只有4列。
- Pandas分块会更有效吗?如果是的话,我怎样才能像上面的例子一样遍历“csv块”的每一行来构建我的API请求呢?
在我尝试将文件分块的可悲尝试中,在以下代码中打印'row'的结果是:
for chunk in pd.read_csv(file_location, chunksize=chunk_size):
for row in chunk:
print(row)
返回
email
device
applicationName
platform
token
所以我超级困惑。提前感谢你的帮助。
1条答案
按热度按时间rbpvctlc1#
看看python生成器,生成器是一种迭代器,它不会将所有的值存储在内存中