查看文章 |
Debian安装 Sphinx 支持中文检索
2009-12-21 11:32
一、编译先前条件 确认是否已经安装以下软件, 有些也许不是必须的, 但建议还是都装上。 apt-get install autoconf automake autotools-dev cpp curl gawk gcc lftp libc6-dev linux-libc-dev make libpcre3-dev libpcrecpp0 g++ libtool libncurses5-dev aptitude install libmysql++-dev libmysqlclient15-dev checkinstall apt-get install python python-dev CSFT介绍 CSFT,全称为CoreSeek Fulltext Search Server,也就是CoreSeek 全文检索服务器。Sphinx默认不支持中文索引及检索,CSFT是在Sphinx基础上开发的全文检索软件,按照GPLv2协议发行。Coreseek (http://www.coreseek.com) 为sphinx在中国地区的用户提供支持服务。 http://www.coreseek.cn/products/ft_down/ CSFT下载 Coreseek Fulltext Server(源代码) : = Sphinx打过中文分词补丁的版本 http://www.coreseek.cn/uploads/csft/3.1/Source/csft-3.1.tar.gz Coreseek Mmseg(源代码) http://www.coreseek.cn/uploads/csft/3.1/Source/mmseg-3.1.tar.gz 安装配置CSFT 1.安装mmseg tar zxvf mmseg-3.1.tar.gz cd mmseg-3.1 ./configure --prefix=/data/opt/mmseg make && make install 2.安装csft tar zxvf csft-3.1.tar.gz cd csft-3.1 ./configure --prefix=/data/opt/sphinx/ --with-mysql=/data/opt/mysql/ --with-mysql-includes=/data/opt/mysql/include/ --with-mysql-libs=/data/opt/mysql/lib/ --with-mmseg=/data/opt/mmseg/ --with-mmseg-includes=/data/opt/mmseg/include/mmseg/ --with-mmseg-libs=/data/opt/mmseg/lib/ make && make install 3.生成词典 重回到mmseg的源代码目录 cd data /data/opt/mmseg/bin/mmseg -u unigram.txt mv unigram.txt.uni uni.lib cp uni.lib /data/opt/sphinx/ 4.配置 charset_type = zh_cn.utf-8 charset_dictpath = /data/opt/sphinx/ 对应编码: sql_query_pre = SET NAMES utf8 #指定编码 #vi /etc/my.cnf 在[client]跟[mysqld]下分别加上default-character-set=utf8 5.运行 建立索引: ./bin/indexer --config etc/sphinx.conf --all 增量索引: ./bin/indexer --rotate --config etc/sphinx.conf test1stemmed 启动进程: ./bin/searchd --config etc/sphinx.conf 查询: ./bin/search -c etc/sphinx.conf 测试 6.问题: 1) 启动索引服务时: error while loading shared libraries: libmysqlclient.so.16: cannot open shared object file: No such file or directory 解决: ln -s /data/opt/mysql/lib/libmysqlclient.so.16 /usr/lib/libmysqlclient.so.16 2) 启动查询服务时: cannot open /data/coreseek/dict/mmseg.ini 解决: vi /data/coreseek/dict/mmseg.ini,输入下面内容 [mmseg] merge_number_and_ascii=1; number_and_ascii_joint=-; compress_space=0; seperate_number_ascii=1; 以上解释如下 /* merge_number_and_ascii: 字母和数字连续出现是非切分 number_and_ascii_joint:连接数字和字母可用的符号,如'-' '.' 等 compress_space:暂时无效 seperate_number_ascii:是否拆分数字,如 1988 -> 1/x 9/x 8/x 8/x */ |
最近读者:

