Sphinx搜索:字符集表困难

问题描述 投票:2回答:3

我现在已经两天对此感到迷茫...

[我想在狮身人面像搜索中使用斯洛文尼亚字母,所有英文字母+čžš(以防万一是ć)

我一直在网上寻找合适的字符,但发现下蹲...

所以我决定一步一步地做自己的事...

这是我的索引

index classifieds
{
    source          = classifieds_src
    path            = c:\Sphinx\data\classifieds
    docinfo         = extern

    min_infix_len       = 2
    infix_fields        = title,keywords,summary,text
    expand_keywords     = 1
    enable_star     = 1


    charset_type        = utf-8
    charset_table = 0..9, a..z, _, A..Z->a..z,-, U+002C, \
    U+010C->U+010D, U+0106->U+0107, U+0160->U+0161, U+017D->U+017E, \
    U+010D->c,U+0107->c, U+0161->s, U+017E->z, \
    U+010D, U+0107, U+0161, U+017E
}

我将大的Č,ĆŠŽ映射到它们的小写字母,并从č进入c,ć进入c,š进入s,ž进入z最后我将这四个字符添加到表中。...

这些是我的分类标题:

t1:HP USB最佳状态RH304t2:ČiškaPCplus MO-U033 + F2(optična,brezžična,PS / 2)t3:MiškaLogitechoptičnaNano M235 siva

db编码:utf8_general_ci表的编码:utf8_general_ci标题字段编码:utf8_general_ci

测试用例:

$testcase = array(
        "miška",
        "mi*ka",
        "Čiška",
        "čiška",
        "miska",
        "usb prenosnik",
        "prenosnik miska",
        "miška usb"
);

//api settings:

$this->sphinx->SetArrayResult(true);
$this->sphinx->setLimits(0, 100);
$this->sphinx->setMatchMode(SPH_MATCH_EXTENDED2);
$this->sphinx->SetSortMode(SPH_SORT_RELEVANCE, '@weight DESC');
$this->sphinx->SetRankingMode(SPH_RANK_PROXIMITY_BM25);
$this->sphinx->SetFieldWeights(array("title"=>100, "keywords"=>80, "summary"=>60,
"text"=>20, "slug"=>10));

最后是测试结果:

关键字(总数/ total_found)字

miška     (0/0)

Array
(
    [*miška*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [miška] => Array
        (
            [docs] => 0
            [hits] => 0
        )

)

mi*ka     (0/0)

Array
(
    [*mi*] => Array
        (
            [docs] => 3
            [hits] => 4
        )

    [mi] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [*2aka*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [2aka] => Array
        (
            [docs] => 0
            [hits] => 0
        )

)

Čiška     (0/0)

Array
(
    [*čiška*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [čiška] => Array
        (
            [docs] => 0
            [hits] => 0
        )

)

čiška     (0/0)

Array
(
    [*čiška*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [čiška] => Array
        (
            [docs] => 0
            [hits] => 0
        )

)

miska     (0/0)

Array
(
    [*miska*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [miska] => Array
        (
            [docs] => 0
            [hits] => 0
        )

)

usb prenosnik     (1/1)

Array
(
    [*usb*] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [usb] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [*prenosnik*] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [prenosnik] => Array
        (
            [docs] => 1
            [hits] => 1
        )

)

prenosnik miska     (0/0)

Array
(
    [*prenosnik*] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [prenosnik] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [*miska*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [miska] => Array
        (
            [docs] => 0
            [hits] => 0
        )

)

miška usb     (0/0)

Array
(
    [*miška*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [miška] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [*usb*] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [usb] => Array
        (
            [docs] => 1
            [hits] => 1
        )

)

您可以清楚地看到,只有在没有斯洛文尼亚特殊字符的查询中,我才能获得积极的结果

[请,请帮助我使我对此失去意识

php utf-8 sphinx character
3个回答
10
投票
sql_query_pre = SET CHARACTER_SET_RESULTS=utf8 sql_query_pre = SET NAMES utf8

参考

http://ryaneby.com/2009/11/21/unicode-and-sphinx.html


0
投票

0
投票

0
投票
© www.soinside.com 2019 - 2024. All rights reserved.