通过regex删除wikitext超链接

slhcrj9b  于 2021-07-13  发布在  Java
关注(0)|答案(1)|浏览(274)

有两种不同类型的wikitext超链接:

[[stack]]
[[heap (memory region)|heap]]

我想删除超链接,但保留文本:

stack
heap

目前,我正在运行两个阶段,使用两个不同的正则表达式:

public class LinkRemover
{
    private static final Pattern
    renamingLinks = Pattern.compile("\\[\\[[^\\]]+?\\|(.+?)\\]\\]");

    private static final Pattern
    simpleLinks = Pattern.compile("\\[\\[(.+?)\\]\\]");

    public static String removeLinks(String input)
    {
        String temp = renamingLinks.matcher(input).replaceAll("$1");
        return simpleLinks.matcher(temp).replaceAll("$1");
    }
}

有没有办法将两个正则表达式“融合”为一个正则表达式,从而获得相同的结果?
如果您想检查建议的解决方案的正确性,这里有一个简单的测试类:

public class LinkRemoverTest
{
    @Test
    public void test()
    {
        String input = "A sheep's [[wool]] is the most widely used animal fiber, and is usually harvested by [[Sheep shearing|shearing]].";
        String expected = "A sheep's wool is the most widely used animal fiber, and is usually harvested by shearing.";
        String output = LinkRemover.removeLinks(input);
        assertEquals(expected, output);
    }
}
ou6hu8tu

ou6hu8tu1#

可以使零件成为可选的管道:

\\[\\[(?:[^\\]|]*\\|)?([^\\]]+)\\]\\]

为了确保始终在方括号之间,请使用字符类。
小提琴(点击java按钮)
图案细节:

\\[\\[         # literals opening square brackets
(?:            # open a non-capturing group
    [^\\]|]*   # zero or more characters that are not a ] or a |
    \\|        # literal |
)?             # make the group optional
([^\\]]+)      # capture all until the closing square bracket
\\]\\]

相关问题