特定字符串后配置单元中的regex extra

djp7away  于 2021-06-27  发布在  Hive
关注(0)|答案(2)|浏览(325)

我对正则表达式的提取非常不好。我正在试着得到用粗体标记的字符串。基本上都是在 cardUUID':u' 直到下一个 ' “{u'cardname':u'hilton garden inn macon/mercer university',u'carduuid':u'a99512ea-c875-4aaa-8b0d-bb8dd668aaa8',u'attributeid':u'29fb392a-b4b6-ffab-d7e8-45e9d470e585',u'title':{u'content':u'hilton garden inn macon/mercer university',u'type':u'title'},u'cardsubtype':u'getaways\u market\u rate',u'slot\u 1':{u'content':u'macon,georgia',u'type':u'location-and-distance'},u'value':{u'content':u'135美元,u'type':u'price'}}
我在正则表达式中尝试了一些东西,但都不起作用。有什么建议吗?我在用Hive

nhjlsmyf

nhjlsmyf1#

使用 get_json_object :
检查原始字符串:

select  '{u\'cardName\': u\'Cortyard Greenbelt\', u\'cardUUID\': u\'cfcc39d4-24d1-40b2-84b5-9aaab263fa0e\', u\'attribtionId\': u\'29fb392a-95fd-268f-7f84-7a58a7494c35\', u\'title\': {\'content\': \'Cortyard Greenbelt\', \'type\': \'title\'}, \'cardSbtype\': \'GETAWAYS_MARKET_RATE\', \'slot_1\': {\'content\': \'Greenbelt, Maryland\', \'type\': \'location-and-distance\'}, \'vale\': {\'content\': \'$140\', \'type\': \'price\'}}' as json;
OK
{u'cardName': u'Cortyard Greenbelt', u'cardUUID': u'cfcc39d4-24d1-40b2-84b5-9aaab263fa0e', u'attribtionId': u'29fb392a-95fd-268f-7f84-7a58a7494c35', u'title': {'content': 'Cortyard Greenbelt', 'type': 'title'}, 'cardSbtype': 'GETAWAYS_MARKET_RATE', 'slot_1': {'content': 'Greenbelt, Maryland', 'type': 'location-and-distance'}, 'vale': {'content': '$140', 'type': 'price'}}
Time taken: 2.38 seconds, Fetched: 1 row(s)

现在移除 u 更换 '" 并提取json元素:

hive> select get_json_object(regexp_replace(json,'(u\')|\'','"'),'$.cardUUID') cardUUID
    > from
    > (
    > select  '{u\'cardName\': u\'Cortyard Greenbelt\', u\'cardUUID\': u\'cfcc39d4-24d1-40b2-84b5-9aaab263fa0e\', u\'attribtionId\': u\'29fb392a-95fd-268f-7f84-7a58a7494c35\', u\'title\': {\'content\': \'Cortyard Greenbelt\', \'type\': \'title\'}, \'cardSbtype\': \'GETAWAYS_MARKET_RATE\', \'slot_1\': {\'content\': \'Greenbelt, Maryland\', \'type\': \'location-and-distance\'}, \'vale\': {\'content\': \'$140\', \'type\': \'price\'}}' as json
    > )s;
OK
cfcc39d4-24d1-40b2-84b5-9aaab263fa0e
Time taken: 0.184 seconds, Fetched: 1 row(s)

如果字符串包含前导和尾随 ' 就像在你的帖子里,他们应该被删除。

xa9qqrwz

xa9qqrwz2#

积极的向后看和向前看与惰性匹配:

(?<='cardUUID': u').*?(?=')

支票:https://regexr.com/42jru

相关问题