groovy 正则表达式，用于具有多个解析模式的复杂分隔字符串

idfiyjo8 于 7个月前发布在其他

关注(0)|答案(3)|浏览(103)

我有以下字符串：

def str='prop1: value1, prop2: value2;value3, prop3:"test:1234, test1:23;45, test2:34;34", prop4: "test1:66;77, 888"'

字符串
我想以下面的配对列表作为结束

prop1: value1
prop2: value2;value3
prop3: test:1234, test1:23;45, test4:34;34
prop4: test, 66;77, 888

型
我想如果我可以先解析并去掉props3和props4，那么我可以简单地用逗号分割字符串的其余部分。
下面是我到目前为止尝试过的代码和正则表达式。代码中注解了我尝试过的各种正则表达式，但无法提取最后一个prop4。

def str='prop1: value1, prop2: value2;value3, prop3:"test:1234, test1:23;45, test4:34;34", prop4: "test, 66;77, 888"'
  //def regex = /(\w+):"(.*)"[,\s$]/
  //def regex = /(\w+):"(.*)"[,|\s|$]/
  def regex = /(\w+):"(.*)"[,\s]|$/
  def m = (str =~ regex)
  (0..<m.count).each{
    println("${m[it][1]}=${m[it][2]}")
  }

型
这将返回：

prop3=test:1234, test1:23;45, test2:34;34
null=null

型
我错过了什么？
(Also，有没有办法只通过一个正则表达式来解析所有这些，而不是我上面描述的方法..正则表达式第一，然后分裂）

groovy

来源：https://stackoverflow.com/questions/77347365/regex-for-complex-delimited-string-with-multiple-parse-patterns

3条答案

按热度按时间

cgh8pdjw1#

基于你的给予的例子数据，下面的正则表达式会起作用：

\b(\w+):\s*(\"[^\"]*\"|[^,\"]*)

字符串
RegEx Demo

RegEx Demo：

\b：字边界
(\w+)：捕获组#1无法匹配1个以上单词字符
:：匹配:
\s*：0个或多个空格
(：启动捕获组#2
\"[^\"]*\"：匹配引用的文本
|：或
[^,\"]*：匹配0个或多个不是,和"的字符
)：结束捕获组#2

赞(0）回复(0）举报 7个月前

f45qwnt82#

如果你可以为不同的比赛使用不同的捕获组，那么请尝试以下正则表达式。这里是Online Demo用于正则表达式。

(.*?),.*?\s(.*?),\s(.*?)"((?:.*?,){2}\s.*?)",\s(.*?:)\s"(.*?)"

字符串

解释：* 为上述正则表达式添加详细解释。

(.*?)            ##Creating 1st capturing group using Lazy match.
,.*?\s           ##using lazy match till next occurrence of comma followed by again a lazy match till space occurrence.
(.*?)            ##Creating 2nd capturing group which does lazy match.
,\s              ##Till next occurrence of comma followed by a space.
(.*?)"           ##Creating 3rd capturing group just before next occurrence of " here.
(                ##Creating 4th capturing group here, which has:
  (?:.*?,){2}    ##In a non-capturing group matching till next occurrence of comma in Lazy match and matching 2 occurrences of it.
  \s.*?          ##Followed by a space and again a lazy match
)                ##Closing 4th capturing group here.
",\s             ##Matching ", followed by space here.
(.*?:)           ##Creating 5th capturing group with lazy match till next occurrence of : here.
\s"              ##Matching space followed by " here.
(.*?)            ##Creating 6th capturing group with a mazy match in it just before next occurrence of " here.
"                ##matching " here.

型

赞(0）回复(0）举报 7个月前

wvt8vs2t3#

尝试以下 * 捕获模式 *。

(prop.+?):\s*(.+?)(?=, prop.+?:|$)

字符串

(prop.+?):\s*，捕获以 “prop” 开头的文本，最大为:。
(.+?)(?=, prop.+?:|$)，捕获到另一个 *“prop”**键 * 或字符串结尾的所有文本，$。

在 Java 中，你可以使用 Pattern 和 Matcher 类。

import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.joining;
import static java.util.stream.Collectors.mapping;

Pattern p = Pattern.compile("(prop.+?):\\s*(.+?)(?=, prop.+?:|$)");
Matcher m = p.matcher(s);
Map<String, String> map
    = m.results()
       .collect(groupingBy(x -> x.group(1),
                                mapping(x -> x.group(2),
                                             joining(";"))));

的数据
输出

{prop2=value2;value3, prop1=value1, prop4="test1:66;77, 888", prop3="test:1234, test1:23;45, test2:34;34"}

型

赞(0）回复(0）举报 7个月前

我来回答

groovy 正则表达式，用于具有多个解析模式的复杂分隔字符串

3条答案

相关问题

热门标签

最新问答