我有一个包含多行的文本文件,我如何在python中使用正则表达式从每行中提取一部分？

ttp71kqs 于 7个月前发布在 Python

关注(0)|答案(2)|浏览(66)

行输入是这样的：

-rw-r--r-- 1 jttoivon hyad-all   25399 Nov  2 21:25 exception_hierarchy.pdf

字符串
所需输出为：

25399 Nov  2 21:25 exception_hierarchy.pdf

型
分别为size、month、day、hour、minute和filename。
问题要求使用正则表达式返回元组(size, month, day, hour, minute, filename)的列表（match，search，findall或finditer方法）。
我试过的代码是-

for line in range(1):

line=f.readline()

x=re.findall(r'[^-]\d+\w+:\w+.*\w+_*',line)

print (x)

My output - [' 21:25 add_colab_link.py']

python-3.x

来源：https://stackoverflow.com/questions/60745272/i-have-a-text-file-with-multiple-lines-how-can-i-extract-a-portion-from-each-li

2条答案

按热度按时间

byqmnocz1#

下面是一个使用正则表达式的工作示例，这要归功于re包：

>>> import re
>>> line = "-rw-r--r-- 1 jttoivon hyad-all   25399 Nov  2 21:25 exception_hierarchy.pdf"
>>> pattern = r"([\d]+)\s+([A-z]+)\s+(\d{1,2})\s+(\d{1,2}):(\d{1,2})\s+(.+)$"
>>> output_tuple = re.findall(pattern, line)[0]
>>> print(output_tuple)
('25399', 'Nov', '2', '21', '25', 'exception_hierarchy.pdf')
>>> size, month, day, hour, minute, filename = output_tuple

字符串
大部分逻辑都被编码在原始的pattern变量中。如果你一点一点地看它，这是非常容易的。请参阅下面，新的行可以帮助你阅读：

([\d]+)    # means basically group of digits (size)
\s+        # means one or more spaces
([A-z]+)   # means one or more letter (month)
\s+        # means one or more spaces
(\d{1,2})  # one or two digits (day)
\s+        # means one or more spaces
(\d{1,2})  # one or two digits (hour)
:          # looking for a ':'
(\d{1,2})  # one or two digits (minute)
\s+        # means one or more spaces
(.+)       # anything basically
$          # until the string ends

型
顺便说一下，这里有一个不使用re的工作示例（因为它实际上不是强制性的）：

>>> line = "-rw-r--r-- 1 jttoivon hyad-all   25399 Nov  2 21:25 exception_hierarchy.pdf"
>>> size, month, day, hour_minute, filename = line.split("hyad-all")[1].strip().split()
>>> hour, minute = hour_minute.split(":")
>>> print(size, month, day, hour, minute, filename)
25399 Nov 2 21 25 exception_hierarchy.pdf

型

赞(0）回复(0）举报 7个月前

carvr3hs2#

import re  # import of regular expression library

# I just assume you had three of those pieces in one list:
my_list = ["-rw-r--r-- 1 jttoivon hyad-all 12345 Nov 2 21:25 exception_hierarchy.pdf", "-rw-r--r-- 1 jttoivon hyad-all 25399 Nov 2 21:25 exception_hierarchy.pdf", "-rw-r--r-- 1 jttoivon hyad-all 98765 Nov 2 21:25 exception_hierarchy.pdf"]

# I create a new list to store the results in
new_list = []

# I produce this loop to go through every piece in the list:
for x in my_list:
    y = re.findall("([0-9]{5}.+pdf)", x) # you can check the meaning of the symbols with a simple google search
    for thing in y:
        a, b, c, d, e = thing.split(" ")
        g, h = d.split(":")
        z = (a, b, c, g, h, e)
        new_list.append(z)

print(new_list)

字符串

赞(0）回复(0）举报 7个月前

我来回答

我有一个包含多行的文本文件,我如何在python中使用正则表达式从每行中提取一部分？

2条答案

相关问题

热门标签

最新问答