regex匹配单词,但前提是它不以非字母数字字符开头

9fkzdhlc  于 2021-07-13  发布在  Java
关注(0)|答案(1)|浏览(272)

我有一些句子,我想在其中识别单词,但如果是以字母数字字符开头的话就不行了。不过,如果以一个结尾就好了。
我所做的一个例子;

words = ["THIS", "THAT"]
sentences = ["I want to identify THIS word.", "And THAT!", "But I do not want to identify !THIS word", "Or [THIS] word"] 

for sentence in sentences:
        for word in words:
                word_re = re.search(r"\b(%s)\b" %word, sentence) 
                if word_re:
                    print("It's a match!")

以上代码的输出将与每个句子匹配。我想要只在前两句中匹配的东西。有可能用regex做我想做的吗?
谢谢!

4dc9hkyq

4dc9hkyq1#

你可以使用正则表达式,比如

(?<!\S)(?:THIS|THAT)\b

查看regex演示。细节: (?<!\S) -左边的空白边界 (?:THIS|THAT) -一个非捕获组匹配 THIS 或者
THAT \b -一个词的边界。
请参见python演示:

import re
words = ["THIS", "THAT"]
sentences = ["I want to identify THIS word.", "And THAT!", "But I do not want to identify !THIS word", "Or [THIS] word"] 

pattern = fr"(?<!\S)(?:{'|'.join(words)})\b"
for sentence in sentences:
    word_re = re.search(pattern, sentence) 
    if word_re:
        print(f"'{sentence}' is a match!")

# => 'I want to identify THIS word.' is a match!

# 'And THAT!' is a match!

如果 THIS 或者 THAT 可以包含特殊字符,替换 pattern = fr"(?<!\S)(?:{'|'.join(words)})\b"pattern = fr"(?<!\S)(?:{'|'.join(map(re.escape, words))})\b" .

相关问题