Caution: This documentation is for eZ Publish legacy, from version 3.x to 6.x.
For 5.x documentation covering Platform see eZ Documentation Center, for difference between legacy and Platform see 5.x Architecture overview.

Auto-complete search

As of eZ Find 2.3 the auto-complete search function was introduced. This feature provides users with a list of suggested search words once they have entered a minimum amount of letters from a word in the search field. In short, the search engine will try to predict the first word the user wants to type and search. Instead of typing the word completely, the user can select the correct word from the suggestion list. To activate this feature you need to configure the settings in the [AutoCompleteSettings] block of your "ezfind.ini" configuration file, located here:

(root of your eZ Publish  installation)/ezpublish_legacy/extension/ezfind/settings/ezfind.ini

First thing to do is to enable the AutoComplete setting. To prevent unrelated or unlimited amount of suggestions, it is possible to limit the suggestions as well as specify the type of info your search engine will return. This is done with the settings Limit and SolrParams. The amount of letters required before the auto-complete function is activated can be configured with the MinQueryLength setting. Keep in mind that setting this to a negative value will disable the auto-complete function.

For more information on the eZ Find auto-complete settings, please visit Configuration settings eZ Find.

Known limitations

Auto-complete with Kanji and Hiragana Japanese characters

eZ Find's auto-complete feature does not behave as expected with Kanji and Hiragana Japanese characters, like it does with katakana characters, which is the correct behavior.

To workaround this a patch is available on jira EZP-21239, which will require a change in solr conf/schema.xml, in the field type "spell" that will need to be changed to:

    <fieldtype name="spell" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/>
        <filter class="solr.JapaneseBaseFormFilterFactory"/>
        <!-- Normalizes full-width romaji to half-width and half-width kana to full-width (Unicode NFKC subset) -->
        <filter class="solr.CJKWidthFilterFactory"/>
        <!-- Removes common tokens typically not useful for search, but have a negative effect on ranking -->
         <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/>
        <!-- Lower-cases romaji characters -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" splitOnNumerics="0" generateNumberParts="0" catenateWords="1" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
    </fieldtype>

In this way, japanese text is split and normalized morphologically. The test string above (ひらがな)  will not autocomplete, but a string like シニアソフトウェアエンジニア will be split into シニア      ソフトウェア      エンジニア  .
If you want the string ひらがな to autocomplete, you may consider the more "hard" option to do only whitespace tokenisation by changing the "spell" field type to:

 <fieldtype name="spell" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" splitOnNumerics="0" generateNumberParts="0" catenateWords="1" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
    </fieldtype>

This will not do any language specific analysis though.

Ricardo Correia (10/09/2013 10:00 am)

Ricardo Correia (02/10/2013 1:54 pm)


Comments

There are no comments.