Python enumerate()tqdm进度条在阅读文件时？

cedebl8k 于 5个月前发布在 Python

关注(0)|答案(5)|浏览(79)

当我使用这段代码对我打开的文件进行格式化时，我看不到tqdm进度条：

with open(file_path, 'r') as f:
    for i, line in enumerate(tqdm(f)):
        print("line #: %s" % i)
        for j in tqdm(range(line_size)):
            ...

字符串
在这里使用tqdm的正确方法是什么？

python

来源：https://stackoverflow.com/questions/48437189/python-enumerate-tqdm-progress-bar-when-reading-a-file

5条答案

按热度按时间

wsewodh21#

使用tqdm时，避免在循环内部打印。另外，仅在第一个for循环中使用tqdm，而不是在内部for循环中使用。

with open(file_path, 'r') as f:
    for i, line in enumerate(tqdm(f)):
        for j in range(line_size):
            ...

字符串
这里提供了一些关于使用enumerate及其在tqdm中的用法的说明。

赞(0）回复(0）举报 5个月前

0md85ypi2#

我也遇到了这个问题--tqdm没有显示进度条，因为文件对象中的行数没有提供。
for循环将遍历各行，阅读直到遇到下一个换行符。
为了将进度条添加到tqdm，您首先需要扫描文件并计算行数，然后将其作为total传递给tqdm。

from tqdm import tqdm

num_lines = sum(1 for line in open('myfile.txt','r'))
with open('myfile.txt','r') as f:
    for line in tqdm(f, total=num_lines):
        print(line)

字符串

赞(0）回复(0）举报 5个月前

u4dcyp6a3#

我尝试在包含所有Wikipedia文章的文件上做同样的事情。所以我不想在开始处理之前计算总行数。此外，它是一个bz2压缩文件，因此解压缩行的len高估了迭代中读取的字节数，所以.

with tqdm(total=Path(filepath).stat().st_size) as pbar:
    with bz2.open(filepath) as fin:
        for i, line in enumerate(fin):
            if not i % 1000:
                pbar.update(fin.tell() - pbar.n)
            # do something with the decompressed line
    # Debug-by-print to see the attributes of `pbar`: 
    # print(vars(pbar))

字符串
谢谢你删除的答案。如果版主取消删除它，你可以婴儿床我的。

赞(0）回复(0）举报 5个月前

koaltpgm4#

如果您正在从一个非常大的文件中阅读，请尝试以下方法：

from tqdm import tqdm
import os

file_size = os.path.getsize(filename)
lines_read= []
pbar = tqdm.tqdm(total=file_zize, unit="MB")
with open(filename, 'r', encoding='UTF-8') as file:
    while (line := file.readline()):
        lines_read.append(line)
        pbar.update(s.getsizeof(line)-sys.getsizeof('\n'))
pbar.close()

字符串
我省略了您可能希望在append(line)之前执行的处理

编辑：

我把len(line)改成了s.getsizeof(line)-sys.getsizeof('\n')，因为len(line)并不能准确地表示实际读取的字节数（参见其他文章）。但即使这样也不是100%准确的，因为sys.getsizeof（line）并不是读取的字节的真实的长度，但如果文件很大，它是一个“足够接近”的黑客。
我确实尝试使用f.tell（）代替，并在while循环中减去文件pos delta，但在Python 3.8.10中，f.tell处理非二进制文件非常慢。
根据下面的链接，我也尝试在Python 3.10中使用f.tell（），但仍然非常慢。
如果有人有更好的策略，请随时编辑这个答案，但请在编辑之前提供一些性能数据。请记住，对于非常大的文件，在执行循环之前计算行数是不可接受的，并且完全违背了显示进度条的目的（例如，尝试一个30Gb的文件，其中有3亿行）

为什么Python中的f.tell（）在非二进制模式下阅读文件时速度很慢https://bugs.python.org/issue11114

赞(0）回复(0）举报 5个月前

ua4mk5z45#

在使用readlines()阅读文件的情况下，可以使用以下方法：

from tqdm import tqdm
with open(filename) as f:
    sentences = tqdm(f.readlines(),unit='MB')

字符串
unit='MB'可以相应地更改为“B”或“KB”或“GB”。

赞(0）回复(0）举报 5个月前

我来回答

Python enumerate()tqdm进度条在阅读文件时？

5条答案

相关问题

热门标签

最新问答