MySQL中文分词

x33g5p2x  于2021-03-14 发布在 Mysql  
字(3.4k)|赞(0)|评价(0)|浏览(298)

全文索引大体分为两个过程:

  • 索引创建(indexer):将现实世界中所有的结构化数据和非结构化数据提取信息,创建索引的过程
  • 搜索索引(search):就是得到用户的查询请求,搜索创建的索引,然后返回结果的过程

编译安装 sphinx+mmsg

0. 安装编译依赖工具包

yum install make gcc gcc-c++ libtool autoconf automake imake mysql-devel libxml2-devel expat-devel

下载稳定版源码包并解压

[root@localhost.localdomain /usr/local/src]
# wget http://www.coreseek.cn/uploads/csft/3.2/coreseek-3.2.14.tar.gz
[root@localhost.localdomain /usr/local/src]
# tar xf coreseek-3.2.14.tar.gz 
[root@localhost.localdomain /usr/local/src]
# cd coreseek-3.2.14
[root@localhost.localdomain /usr/local/src/coreseek-3.2.14]
# ls
csft-3.2.14(sphinx)  mmseg-3.2.14  README.txt  testpack
其中-- csft-4.1是修改适应了中文环境后的sphinx
Mmseg  是中文分词插件
Testpack是测试用的软件包

安装 mmseg

cd mmseg

[root@localhost.localdomain /usr/local/src/coreseek-3.2.14]
# cd mmseg-3.2.14/

执行bootstrap脚本

[root@localhost.localdomain /usr/local/src/coreseek-3.2.14/mmseg-3.2.14]
# ./bootstrap 

./configure --prefix=/usr/local/mmseg

[root@localhost.localdomain /usr/local/src/coreseek-3.2.14/mmseg-3.2.14]
# ./configure --prefix=/usr/local/mmseg

make && make install

[root@localhost.localdomain /usr/local/src/coreseek-3.2.14/mmseg-3.2.14]
# make && make install

安装coreseek

[root@localhost.localdomain /usr/local/src/coreseek-3.2.14/csft-3.2.14]
# ./buildconf.sh 
[root@localhost.localdomain /usr/local/src/coreseek-3.2.14/csft-3.2.14]
# ./configure --prefix=/usr/local/coreseek  --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg/lib/ --with-mysql
[root@localhost.localdomain /usr/local/src/coreseek-3.2.14/csft-3.2.14]
# make && make install

Sphinx的使用

  1. 数据源---要让sphinx知道,查哪些数据,即针对哪些数据做索引(可以定义多个源)
  2. 索引配置---针对哪个源做索引, 索引文件放在哪个目录?? 等等
  3. 搜索服务器---sphinx可以在某个端口(默认9312),以其自身的协议,与外部程序做交互.

配置数据源

[root@localhost.localdomain /usr/local/coreseek/etc]
# cp sphinx.conf.dist sphinx.conf
[root@localhost.localdomain /usr/local/coreseek/etc]
# vim sphinx.conf

如下配置:

source src1 {
    type = mysql
    sql_host  = localhost
    sql_user  = root
    sql_pass  = aaaaaa
    sql_db    = test
    sql_query_pre = set names utf8
    sql_query_pre = set session query_cache_type=off
    sql_query = `select a_id as id,cat_id,title,simtitle,seotitle,tags,source,description,content,dateline,editdateline from article`
    sql_attr_uint = a_id
    sql_attr_uint = cat_id
    sql_attr_timestamp = dateline
    sql_attr_timestamp = editdateline
    sql_query_info = `SELECT * FROM article WHERE a_id=$id`
}

索引典型配置

> index test1 {
>     source = test
>     path = /usr/local/sphinx/var/data/test1 # 生成索引放在哪
>     # stopwords = G:\data\stopwords.txt
>     # wordforms = G:\data\wordforms.txt
>     # exceptions = /data/exceptions.txt
>     charset_dictpath = /usr/local/mmseg/etc/
>     charset_type = zh_cn.utf-8
> }

生成索引文件

[root@localhost.localdomain /usr/local/coreseek/etc]
# /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/sphinx.conf test1 (test1为索引名称)
Coreseek Fulltext 3.2 [ Sphinx 0.9.9-release (r2117)]
Copyright (c) 2007-2011,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)
 using config file '/usr/local/coreseek/etc/sphinx.conf'...
indexing index 'test1'...
collected 8122 docs, 47.6 MB
sorted 8.7 Mhits, 100.0% done
total 8122 docs, 47596333 bytes
total 17.782 sec, 2676636 bytes/sec, 456.75 docs/sec
total 5 reads, 0.011 sec, 4559.8 kb/call avg, 2.3 msec/call avg
total 58 writes, 0.429 sec, 903.8 kb/call avg, 7.3 msec/call avg

Error 注意:
/usr/local/coreseek/bin/indexer: error while loading shared libraries: libmysqlclient.so.18: cannot open shared object file: No such file or directory
发现sphinxindexer依赖库ibmysqlclient.so.18找不到,通过编辑此文件来修复这个错误 /etc/ld.so.conf
vi /etc/ld.so.conf
将下面这句加到文件到尾部,并保存文件
/usr/local/mysql/lib
然后运行下面这个命令即可
ldconfig

在命令行测试查询

[root@localhost.localdomain /usr/local/coreseek]
# ./bin/search -c etc/sphinx.conf 留学

```

相关文章