我有一个包含多行的文本文件,我如何在python中使用正则表达式从每行中提取一部分?

ttp71kqs  于 7个月前  发布在  Python
关注(0)|答案(2)|浏览(66)

行输入是这样的:

-rw-r--r-- 1 jttoivon hyad-all   25399 Nov  2 21:25 exception_hierarchy.pdf

字符串
所需输出为:

25399 Nov  2 21:25 exception_hierarchy.pdf


分别为sizemonthdayhourminutefilename
问题要求使用正则表达式返回元组(size, month, day, hour, minute, filename)的列表(matchsearchfindallfinditer方法)。
我试过的代码是-

for line in range(1):
line=f.readline()
x=re.findall(r'[^-]\d+\w+:\w+.*\w+_*',line)
print (x)

My output - [' 21:25 add_colab_link.py']
byqmnocz

byqmnocz1#

下面是一个使用正则表达式的工作示例,这要归功于re包:

>>> import re
>>> line = "-rw-r--r-- 1 jttoivon hyad-all   25399 Nov  2 21:25 exception_hierarchy.pdf"
>>> pattern = r"([\d]+)\s+([A-z]+)\s+(\d{1,2})\s+(\d{1,2}):(\d{1,2})\s+(.+)$"
>>> output_tuple = re.findall(pattern, line)[0]
>>> print(output_tuple)
('25399', 'Nov', '2', '21', '25', 'exception_hierarchy.pdf')
>>> size, month, day, hour, minute, filename = output_tuple

字符串
大部分逻辑都被编码在原始的pattern变量中。如果你一点一点地看它,这是非常容易的。请参阅下面,新的行可以帮助你阅读:

([\d]+)    # means basically group of digits (size)
\s+        # means one or more spaces
([A-z]+)   # means one or more letter (month)
\s+        # means one or more spaces
(\d{1,2})  # one or two digits (day)
\s+        # means one or more spaces
(\d{1,2})  # one or two digits (hour)
:          # looking for a ':'
(\d{1,2})  # one or two digits (minute)
\s+        # means one or more spaces
(.+)       # anything basically
$          # until the string ends


顺便说一下,这里有一个不使用re的工作示例(因为它实际上不是强制性的):

>>> line = "-rw-r--r-- 1 jttoivon hyad-all   25399 Nov  2 21:25 exception_hierarchy.pdf"
>>> size, month, day, hour_minute, filename = line.split("hyad-all")[1].strip().split()
>>> hour, minute = hour_minute.split(":")
>>> print(size, month, day, hour, minute, filename)
25399 Nov 2 21 25 exception_hierarchy.pdf

carvr3hs

carvr3hs2#

import re  # import of regular expression library

# I just assume you had three of those pieces in one list:
my_list = ["-rw-r--r-- 1 jttoivon hyad-all 12345 Nov 2 21:25 exception_hierarchy.pdf", "-rw-r--r-- 1 jttoivon hyad-all 25399 Nov 2 21:25 exception_hierarchy.pdf", "-rw-r--r-- 1 jttoivon hyad-all 98765 Nov 2 21:25 exception_hierarchy.pdf"]

# I create a new list to store the results in
new_list = []

# I produce this loop to go through every piece in the list:
for x in my_list:
    y = re.findall("([0-9]{5}.+pdf)", x) # you can check the meaning of the symbols with a simple google search
    for thing in y:
        a, b, c, d, e = thing.split(" ")
        g, h = d.split(":")
        z = (a, b, c, g, h, e)
        new_list.append(z)

print(new_list)

字符串

相关问题