为什么RLIKE匹配MariaDB 10.2中的emoji?

brccelvz  于 4个月前  发布在  其他
关注(0)|答案(1)|浏览(79)

DB:MariaDB 10.2
为什么这个简单的正则表达式匹配emoji时,emoji是4字节长。它不应该只匹配问号字符吗?

([email protected]:3306) [test]> select '😃' RLIKE '^[?]+$';
+-----------------------------------+
| '\xF0\x9F\x98\x83' RLIKE '^[?]+$' |
+-----------------------------------+
|                                 1 |
+-----------------------------------+
1 row in set (0,00 sec)

([email protected]:3306) [test]> SHOW VARIABLES LIKE 'collation%';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database   | utf8mb4_general_ci |
| collation_server     | utf8mb4_general_ci |
+----------------------+--------------------+
3 rows in set (0,00 sec)

字符串

qaxu7uf2

qaxu7uf21#

我可以在10.6.12上复制:

set @@character_set_connection=utf8mb4;
select '😃' rlike '^[?]+$';

字符串
我想可能跟这个问题有关:
https://jira.mariadb.org/browse/MDEV-11777?jql=project%20%3D%20MDEV%20AND%20text%20~%20%22regexp%22
当字符集为utf8mb4时,REGEXP_REPLACE函数将补充字符(4字节utf8编码)转换为“?”。
但是对danblack的MDEV-32904的响应表明它可能是由于character_set%变量中的不匹配而发生的。例如:

MariaDB [(none)]> show variables like 'character_set_%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb3                    |
| character_set_connection | utf8mb3                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb3                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8mb3                    |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.001 sec)

MariaDB [(none)]> select hex('😃');
+-------------------------+
| hex('\xF0\x9F\x98\x83') |
+-------------------------+
| F09F9883                |
+-------------------------+
1 row in set (0.000 sec)

MariaDB [(none)]> set @@character_set_connection=utf8mb4;
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> select hex('😃');
+-------------------------+
| hex('\xF0\x9F\x98\x83') |
+-------------------------+
| 3F3F3F3F                |
+-------------------------+
1 row in set (0.000 sec)

MariaDB [(none)]> set names utf8mb4;
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> select hex('😃');
+----------+
| hex('?') |
+----------+
| F09F9883 |
+----------+
1 row in set (0.000 sec)

MariaDB [(none)]>

相关问题