Caution: This documentation is for eZ Publish legacy, from version 3.x to 6.x.
For 5.x documentation covering Platform see eZ Documentation Center, for difference between legacy and Platform see 5.x Architecture overview.

Indexing content

DelayedIndexing

Indexing content in Solr can be a time-consuming operation and depending on the Solr index site it can have an impact on the publishing time. Therefore it's possible to delay content indexing by enabling [SearchSettings] DelayedIndexing in the global override site.ini. The indexing operations will then be queued for deferred handling. To actually index the objects you will need to enable two cronjobs: indexcontent and ezfoptimizeindex. Note that the ezfindexcontent cronjob is deprecated, and should not be used anymore.

Cronjobs

The indexcontent.php cronjob script can be found under the cronjobs/ folder, and the eZ Find cronjob ezfoptimizeindex can be found under extension/ezfind/cronjobs/ on your eZ Publish root folder.

To optimize search functionality the indexcontent cronjob must be executed frequently because objects published or modified between two executions of this cronjob will not be returned or up-to-date in search results. It is therefore wise to run it every five minutes. You can set the cronjob to run frequently by adding "Scripts[]=indexcontent.php" to the CronjobPart-frequent section in an override of your cronjob.ini. 

By default, the indexcontent.php cronjob script only makes part of the default script list, and to make it run you will also execute the other existing cronjob scripts, which can be done by running the following from the root folder of your eZ Publish installation:

php runcronjobs.php -s <siteaccess>

Anyway, the indexcontent.php cronjob script isn't included in any cronjob "CronjobPart" group. If you intend to run this script in an independent CronjobPart group, you will need to make this configuration manually in an override of the cronjob.ini file, like in the example below:

[CronjobPart-indexcontent]
Scripts[]=indexcontent.php

The indexcontent.php cronjob script can also be run as part of your CronjobPart group using the following command from the root folder of your eZ Publish installation:

php runcronjobs.php -s <siteaccess> indexcontent

The ezfoptimizeindex cronjob will optimize the Solr index so that Solr can handle search queries faster. Unlike the former this cronjob doesn't have to be executed very frequently, as optimizing is a heavy operation. Depending on how frequent content is published it can be done once or twice a day, or in very active sites every X hours. Therefore the suggested frequency is set to 'infrequent'.
 This cronjob can be run by the following command:

php runcronjobs.php -s <siteaccess> ezfoptimizeindex

For more information about configuring eZ Publish cronjobs visit the running cronjobs related documentation.

Note: After hiding a node the search index isn't updated instantly. For the changes to take effect the "frequent" cronjob needs to be executed, and Solr's search index needs to be updated.

Important note: Currently, the only possible way to be sure that a content object is correctly indexed is by using DelayedIndexing. This is because of its design, which marks the content to be indexed, and only considers the content has been correctly indexed after a valid response from solr. This way an object isn't skipped from the search index even if solr is unavailable or if the index fails for any other reason. In the case of failure the content object will still remain marked to be be indexed, and another try will take place the next time the cronjob runs.

OptimizeOnCommit

The OptimizeOnCommit setting controls the behaviour of the addObject and deleteObject calls with respect to optimizing the Solr index on commits. If the DelayedIndexing setting is enabled, the OptimizeOnCommit setting should be disabled in order to avoid useless optimization calls on commit during content indexing. This setting can be found in the [IndexOptions] section in your ezfind.ini.

Geir Arne Waaler (06/06/2011 1:26 pm)

Ricardo Correia (19/11/2013 4:14 pm)

Geir Arne Waaler, Ricardo Correia


Comments

There are no comments.