python正则表达式：我希望找到所有重叠和非重叠的模式匹配

x9ybnkn6 于 2021-08-20 发布在 Java

关注(0)|答案(1)|浏览(208)

我想找到所有重叠和非重叠模式匹配
代码如下：

import re

words = [r"\bhello\b",r"\bworld\b",r"\bhello world\b"] 
sentence = "Hola hello world and hello"

for word in words:
    for match in re.finditer(word,sentence):
        print(match.span(),match.group())

给我以下结果（我对此感到高兴，但需要一种有效的方法）

(5, 10) hello
(21, 26) hello
(11, 16) world
(5, 16) hello world

我知道这不是很有效。示例：假设我有20k个单词和10k个句子，那么要重新匹配需要200万x 2次调用，这需要很多时间。
你能给我建议一个解决这个问题的有效方法吗？

python List regex re

来源：https://stackoverflow.com/questions/68329670/python-regex-i-would-like-to-find-all-overlapping-and-non-overlapping-pattern

1条答案

按热度按时间

hof1towb1#

顺序不同，但结果相同，速度明显加快：

import re

substrings=['hello','world','hello world']
joined='|'.join(substrings)
reg=re.compile(rf"\b(?={joined}\b)")

for m in reg.finditer(sentence):
    for e in substrings:
        offset=m.span()[0]
        if sentence[offset:offset+len(e)]==e:
            print((offset,offset+len(e)), e)

如果要确保不匹配 hello worldLY （即，仅子字符串的前缀）您可以执行以下操作：

substrings=['hello','world','hello world']
joined='|'.join(substrings)
reg=re.compile(rf"\b(?={joined}\b)")
indvidual=list(map(re.compile, [rf'\b({e})\b' for e in substrings]))
for m in reg.finditer(sentence):
    for i,e in enumerate(indvidual):
        offset=m.span()[0]
        if m2:=e.match(sentence[offset:]):
            print((offset,offset+m2.span()[1]), substrings[i])

任何一种打印：

(5, 10) hello
(5, 16) hello world
(11, 16) world
(21, 26) hello

赞(0）回复(0）举报 2021-08-20

我来回答

python正则表达式：我希望找到所有重叠和非重叠的模式匹配

1条答案

相关问题

热门标签

最新问答