如何使用REGEXP_REPLACE将口头的BOOLEAN单词集合转换为MATCH AGAINST格式?[MySQL 8.0.36]

p4tfgftt  于 5个月前  发布在  Mysql
关注(0)|答案(2)|浏览(63)

我需要将文本查询从标准的“verbal”布尔值转换为FULLLTEXT MATCH AGAINST +和-系统。
例如,此值:

((word1 AND word2) OR (word3 AND word4 AND word5)) OR word6 NOT word7

字符串
需要转换为:

((+word1 +word2) (+word3 +word4 +word5) word6 -word7


我遇到的问题是,当我尝试使用REGEXP_REPLACE与这样的模式:

\\b([[:alnum:]_]+)\\b


我不知道如何让函数使用匹配的单词来追加/修改它。我见过其他例子试图使用\1或类似的东西来表示匹配的值,但它根本不起作用。
那么,我如何命令REGEXP_REPLACE动态扫描字符串中的“(word +word”并将其替换为“(+word +word”?还需要做一个简单的前导,如:
它也有点复杂,因为如果我们有:

(word1 AND word2) AND (word3 OR word4)


这将需要:

+(+word1 +word2) +(word3 word4)


任何关于在不编写超慢解析器的情况下做到这一点的最佳方法的指导都将受到欢迎。

o75abkj4

o75abkj41#

我想出了如何将下面的AND/OR/NOT布尔语句转换为兼容的MATCH AGAINST解决方案,并希望这对其他人有所帮助。
这将转换:

(word1 OR word2) OR (word3 AND word4 AND word5) OR word6 NOT word7

字符串

(WORD1 WORD2) (+WORD3 +WORD4 +WORD5) WORD6 -WORD7


产品编号:

SET in_text_search = '(word1 OR word2) OR (word3 AND word4 AND word5) OR word6 NOT word7';
SET text_reformatted = UCASE(TRIM(in_text_search));
SET text_reformatted = REPLACE(text_reformatted,'  ',' '); -- Strip extra spaces

-- Make sure the count of parentheses (if exist) is equal.
IF (INSTR(text_reformatted,'(') > 0 OR INSTR(text_reformatted,')') > 0) THEN
        SET count_character = LENGTH(text_reformatted) - LENGTH(REPLACE(text_reformatted,'(',''));
        IF (count_character <> (LENGTH(text_reformatted) - LENGTH(REPLACE(text_reformatted,')','')))) THEN
                -- TRIGGER ERROR 'imbalanced parentheses';
                LEAVE SP;
        END IF;
END IF;

SET text_reformatted = REPLACE(text_reformatted,' AND ',' +');
SET text_reformatted = REPLACE(text_reformatted,' OR ',' ');
SET text_reformatted = REPLACE(text_reformatted,' NOT ',' -');

-- Primary replace
SET text_reformatted = REGEXP_REPLACE(text_reformatted, '(?<![-+])\\b(\\w+)\\b(?=(?:\\s*\\+|\\s*$))', '+$1', 1, 0, 'c');

-- Cleanup the cart before the horse, for words like .NET etc.
SET text_reformatted = REPLACE(text_reformatted,'.+','+.');
SET text_reformatted = REPLACE(text_reformatted,'.-','-.');

j91ykkif

j91ykkif2#

在某种程度上,这个SQL风格的解决方案是为了好玩,但也许它会帮助一些人。

with RECURSIVE
  t1(in_text_search) as (
    values
      row('(word1 OR word2) and .net AND (word3 AND word4 AND word5) OR word6 NOT word7')
  ),
  r as (
    select
      0 as occ,
      cast('' as char(127)) as tok,
      in_text_search,
      0 as parentheses,
      cast('process' as char(30)) as status
    from t1
    union all
    select
      r.occ+1,
      case
        when t.tok = 'AND' then '+'
        when t.tok = 'NOT' then '-'
        when t.tok = 'OR' then ' '
        else coalesce(t.tok, '')
      end,
      r.in_text_search,
      p.parentheses,
      case
        when p.parentheses < 0 then 'imbalanced parentheses'
        when t.tok is null then
          case
            when p.parentheses != 0 then 'imbalanced parentheses'
            else 'done'
          end
        else 'process'
      end
    from r,
    LATERAL(
      select
        ucase(regexp_substr(
          r.in_text_search,
          '[()]|[^()[:space:]]+',
          1,
          r.occ+1
        )) as tok
    ) as t,
    LATERAL(
      select
        r.parentheses +
          case
            when t.tok like '(' then 1
            when t.tok like ')' then -1
            else 0
          end as parentheses
    ) as p
    where status = 'process'
  ),
  t2 as (
    select
      occ,
      tok,
      lead(tok, 2) over(order by occ) as tok_2nd,
      status
    from r
  ),
  t3 as (
    select
      occ,
      case
        when tok = '(' and tok_2nd in ('+', '-') then concat(tok, tok_2nd)
        when tok in ('+', '-') then concat(' ', tok)
        else tok
      end as tok,
      status
    from t2
  )
select
  GROUP_CONCAT(tok order by occ separator '') as text_reformatted
from t3
;

个字符
db<>fiddle

相关问题