csv 如何读取stackoverflow归档后数据到python？[重复]

1aaf6o9v 于 5个月前发布在 Python

关注(0)|答案(1)|浏览(66)

这个问题已经有答案了：

What is the fastest way to parse large XML docs in Python?（8个回答）
Large XML File Parsing in Python（2个答案）
28天前关闭
我从https://archive.org/details/stackexchange下载了stackoverflow帖子的存档数据。
然而，在扩展7z文件后，我有一个103GB的xml文件。
我试着用python加载它，但服务器失败了。
如何将这么大的文件加载到python中来分析数据？
我尝试了以下链接的代码：how to convert xml file to csv using python script
XML to CSV Python
但是python在打开xml文件的步骤上停了下来。

csv

来源：https://stackoverflow.com/questions/77610765/how-to-read-stackoverflow-archive-post-data-to-python

1条答案

按热度按时间

liwlm1x91#

是的，这是一个挑战，我也遇到了同样的问题，
而不是完全在RAM中加载文件。你可以流文件。对于XML文件流，你可以使用python的ElementTree.

import xml.etree.ElementTree as ET

def process_element(elem):
    # your code goes here
    print(elem.tag, elem.attrib)

file_path = 'demo.xml'

# Create a parser
context = ET.iterparse(file_path, events=('start', 'end'))

# Iterate the file
for event, elem in context:
    if event == 'end':  
        process_element(elem)
        elem.clear()  # Free up memory

# Must not forget
context.root.clear()

字符串

赞(0）回复(0）举报 5个月前

我来回答

csv 如何读取stackoverflow归档后数据到python？[重复]

1条答案

相关问题

热门标签

最新问答