Path

ezpublish / documentation / ez publish / technical manual / 5.x / features / clustering / setting it up for an ezdfsf...


Caution: This documentation is for eZ Publish legacy, from version 3.x to 6.x.
For 5.x documentation covering Platform see eZ Documentation Center, for difference between legacy and Platform see 5.x Architecture overview.

Setting it up for an eZDFSFileHandler

The following instructions reveal how you can configure eZ Publish to store images, binary files and content-related caches in the database when using a eZ DFS File Handler.
 Before going any further, please read the note: known issue added at the end of this page regarding a known issue when using eZ Publish in a clustered environment.

1. Clear the caches (optional)

It is recommended (but not required) to clear all eZ Publish caches before enabling the clustering functionality. This can be done by running the following command from the root of your eZ Publish installation (if you are using multiple servers, run this command from each server node in order to clear the local caches for each one):

On eZ Publish 5.0:

The following should be executed from your <eZ_Publish_root>/ezpublish_legacy/ folder:

php bin/php/ezcache.php --clear-all --purge

"php" should be replaced by the complete path to your php executable.

After that you should also clear Symfony's cache, by executing the following shell command from your eZ Publish root folder:

php ezpublish/console cache:clear --env=prod

On eZ Publish 5.1 and higher:

Only the Symfony's console cache:clear command needs to be executed, from your eZ Publish root folder:

php ezpublish/console cache:clear --env=prod

Or use Symfony's console to run the ezcache.php legacy script:

php ezpublish/console --env=prod ezpublish:legacy:script bin/php/ezcache.php --clear-all --purge

After clearing the caches, make sure that all cache files have been cleared by inspecting the contents of the various cache sub-directories within the "var" directory (typically the "<eZ_Publish_root>/ezpublish_legacy/var/cache/" and "<eZ_Publish_root>/ezpublish_legacy/var/<name_of_siteaccess>/cache/" directories). If there are any cache files left, remove them manually.

2. Modify the "file.ini" settings

Add the following lines to an override for the "file.ini" configuration file ("<eZ_Publish_root>/ezpublish_legacy/settings/override/file.ini.append.php" or "<eZ_Publish_root>/ezpublish_legacy/settings/siteaccess/ezwebin_site/file.ini.append.php" where "ezwebin_site" is the name of your siteaccess):

[ClusteringSettings]
FileHandler=eZDFSFileHandler

First define the proper file handler, which here is "FileHandler=eZDFSFileHandler".

When using eZDFSFileHandler, configure the settings in the [eZDFSClusteringSettings] block in the same override file. It is necessary to define ("/media/nfs" is just an example) the path to the NFS mount point (a local folder) and set the database back-end setting to "eZDFSFileHandlerMySQLiBackend" (MySQL is the only database supported for eZDFS) as shown here:

[eZDFSClusteringSettings]
MountPointPath=/media/nfs
DBBackend=eZDFSFileHandlerMySQLiBackend
DBHost=dbhost
DBPort=3306
DBSocket=
DBName=cluster
DBUser=root
DBPassword=
DBConnectRetries=3
DBExecuteRetries=20
MaxCopyRetries=5

Replace "dbhost", "name" (for example "DBName=Cluster"), "user" and "pass" by actual host name, database name, user name and password. In most cases these values will be the same as "Server", "Database", "User", "Password" settings specified under the [DatabaseSettings] block of your "site.ini.append.php" configuration file.

Note: Folder indicated in MountPointPath shouldn't contain anything but files handled by eZ Publish cluster since files within this folder are maintained by cluster maintenance scripts, and can be potentially removed. If you need to store files here (i.e. custom cache files), be sure to use eZClusterFileHandler in your own PHP code.

MaxCopyRetries setting is the maximum number of times a file is tried to be copied locally from the network mount point. Under extreme circumstances (i.e. very high traffic), you'll need to increase this value. Increase this value if you find files like expiryXYZtmp.php with 0 byte size.

Additional setting introduced in eZ Publish 5.2

As of eZ Publish 5.2 a new [eZDFSClusteringSettings].MetaDataTableNameCache setting is available in file.ini, which allows to define the name of the table where cache files metadata is stored, by setting it to an existing table to use this table for cache storage. The default value is "ezdfsfile_cache".

Note: The usage of the CLUSTER_METADATA_TABLE_CACHE constant (on step 3, under "Additional configurations in eZ Publish 5.2") is recommended over this INI setting, since it will affect both the FileHandler API, from within requests handled by index.php, and the cluster index used to deliver binary files over HTTP. We recommend that you use the constant and not the INI setting.

More details about this implementation can be found in the doc/features/5.2/dfs_split_tables.md file.

3. Create a new script for serving images

On a clustered installation, files from the var folder that are read through HTTP will be served by PHP. Your web-server (e.g Apache) will be instructed to use a specific PHP script, "index_cluster.php", for serving those files. From version 4.7, this file is common to all clustered installations.

In order to maximize performances, this script doesn't use the fully blown INI configuration system. Dedicated settings must be provided using either config.php, or config.cluster.php, at the root of your eZ Publish setup. Both files will lead to the same behaviour, but config.cluster.php is only included when serving requests through index_cluster.php, not index.php. The list of possible settings can be found in the config.php-RECOMMENDED file shipped with the release. The easiest way is to copy the relevant settings from this file to your own config.php. A typical config.php for a DFS MySQLi cluster would look like this:

<?php
define( 'CLUSTER_STORAGE_BACKEND', 'dfsmysqli' );
define( 'CLUSTER_STORAGE_HOST', 'localhost' );
define( 'CLUSTER_STORAGE_PORT', 3306 );
define( 'CLUSTER_STORAGE_USER', 'dbuser' );
define( 'CLUSTER_STORAGE_PASS', 'dbpassword' );
define( 'CLUSTER_STORAGE_DB', 'ezpcluster' );
define( 'CLUSTER_STORAGE_CHARSET', 'utf8' );
define( 'CLUSTER_MOUNT_POINT_PATH', '/media/nfs' );

Note: Make sure you specify the same database settings as indicated under the "[eZDFSClusteringSettings]" block in your "file.ini.append.php" configuration file.

Possible values for CLUSTER_STORAGE_BACKEND on a DFS cluster are:

  • dfsmysqli for a MySQL database
  • dfsoracle for an oracle database (requires the Oracle database extension)

Additional configurations in eZ Publish 5.2

As of eZ Publish 5.2 additional configurations are also available, since the CLUSTER_METADATA_TABLE_CACHE, CLUSTER_METADATA_CACHE_PATH and CLUSTER_METADATA_STORAGE_PATH constants have been introduced in this version.

The CLUSTER_METADATA_TABLE_CACHE constant defines the name of the table where cache files metadata is stored. Set it to an existing table to use this table for cache storage. Default vaule is "ezdfsfile_cache". This can also be done by using the [eZDFSClusteringSettings].MetaDataTableNameCache setting in file.ini.
Note: CLUSTER_METADATA_TABLE_CACHE is recommended over the INI setting, since it will affect both the FileHandler API, from within requests handled by index.php, and the cluster index used to deliver binary files over HTTP. We recommend that you use the constant and not the INI setting.

The CLUSTER_METADATA_CACHE_PATH constant defines the path part for storage files, used to distinguish cache files from storage files. Must only be modified if you have changed [FileSettings].StorageDir setting in site.ini. Default value is "/cache/".

The CLUSTER_METADATA_STORAGE_PATH constant defines the path part for storage files, used to distinguish storage files from cache files. Must only be modified if you have changed [FileSettings].StorageDir setting in site.ini. Default value is "/storage/".

According to the new configurations introduced in 5.2 here's an a configuration example:

<?php
define( 'CLUSTER_STORAGE_BACKEND', 'dfsmysqli' );
define( 'CLUSTER_STORAGE_HOST', 'localhost' );
define( 'CLUSTER_STORAGE_PORT', 3306 );
define( 'CLUSTER_STORAGE_USER', 'dbuser' );
define( 'CLUSTER_STORAGE_PASS', 'dbpassword' );
define( 'CLUSTER_STORAGE_DB', 'ezpcluster' );
define( 'CLUSTER_STORAGE_CHARSET', 'utf8' );
define( 'CLUSTER_MOUNT_POINT_PATH', '/media/nfs' );
 
// New metadata configurations introduced in eZ Publish 5.2
define( 'CLUSTER_METADATA_TABLE_CACHE', 'ezdfsfile_cache' );
define( 'CLUSTER_METADATA_CACHE_PATH', '/cache/' );
define( 'CLUSTER_METADATA_STORAGE_PATH', '/storage/' );

Note: The newly introduced metadata configurations will only work in eZ Publish 5.2 and higher versions.

More details about this implementation can be found in the doc/features/5.2/dfs_split_tables.md file.

4. Create new database tables

The database table structure required to hold clustered file information needs to be created. This must be done manually, either on the same database server as the one used for the relational database, or on a different one. Keep in mind that for large scale websites, a dedicated database server willl greatly improve performances and scalability. Also, remember that eZ Systems does not support using the same database schema for the relational and cluster database. While it will work for testing, it will most likely lead to severe errors when used in production.

The schema can be found in the following files:

  • mysql: <eZ_Publish_root>/ezpublish_legacy/kernel/sql/mysql/cluster_dfs_schema.sql

5. Import files to the cluster

You need to copy the files stored in the "var" directory to the cluster. To do this, go to the root directory of eZ Publish and run the following script (replace "ezwebin_site" by the actual name of your siteaccess) from your <eZ_Publish_root>/ezpublish_legacy/ folder:

php bin/php/clusterize.php -s ezwebin_site

Note that "php" should be replaced by the path to your php executable.

The meta-data will be stored on the database, whereas the files themselves are copied to the configured NFS mount point using a structure exactly similar as that of the "var" directory.

Keep in mind that this process might take some time, depending on the amount of files that need to be imported.

6. Compile the templates (optional)

Since all caches now are empty, you should re-compile the templates. Note that this step can be skipped and thus the templates will be compiled on-demand when the site is browsed. Go to the root directory of eZ Publish and run this command (if you are using multiple servers, run this command from each server node in order to compile the templates for each one):

php bin/php/eztc.php -s ezwebin_site

Note that "php" should be replaced by the path to your php executable.

Replace "ezwebin_site" by the actual name of your siteaccess. Repeat this step for all siteaccesses that are in use.

7. Update the Apache configuration

Apache needs to know which PHP script to use when serving images, in this case index_cluster.php. The script simply fetches the images from the database and serves them. By adding the RewriteRules mentioned below every request for a content image or binary file will be rewritten to index_cluster.php, which will then deliver the files directly through HTTP from the NFS server. These rules are the same for eZDFS and eZDB. So add the following rewrite rules to the ".htaccess" file before the other/existing rules:

RewriteRule ^/var/([^/]+/)?storage/images-versioned/.* /index_cluster.php [L]
RewriteRule ^/var/([^/]+/)?storage/images/.* /index_cluster.php [L]
RewriteRule ^/var/([^/]+/)?cache/public/(stylesheets|javascript) /index_cluster.php [L]
RewriteRule ^/index_cluster.php - [L]

If no ".htaccess" file is used, add the same rules above the existing rewrite rules for eZ Publish in your Apache configuration file because these rules need to be found before the standard eZ Publish rewrite

8. Restart Apache and test the site

Restart the Apache web server. After it has been restarted, the system should be up and running in cluster mode. Verify that the site works correctly, content images are displayed and content binary files are accessible (open the site pages in a web browser, log in to the administration interface, try clicking around and so on).

If for example a page of your website does not work correctly because its images are not displayed, your rewrite rules or your "index_cluster.php" file might be configured incorrectly. To locate the error, load the image directly in the browser (by, for example, choosing "open image in a new tab"). If instead of the image "Module not found" is displayed, then your rewrite rules are not correctly configured. If a PHP error is shown, your "index_cluster.php" is most likely configured wrong.

To test and troubleshoot your website, it can be useful to have more debug information regarding the cluster. This is an optional configuration but to enable it, create an override of the debug.ini file and enable "kernel-clustering" in the [GeneralCondition] block like this:

[GeneralCondition]
(...)
kernel-clustering=enabled
(...)

9. Remove the imported files from the file system

If the site works correctly, you can remove the original content images and binary files from the file system (since they have been successfully imported to the database). To do this, you need to inspect the contents of the various storage sub-directories within the "var" directory (typically the "<eZ_Publish_root>/ezpublish_legacy/var/storage/" and "<eZ_Publish_root>/ezpublish_legacy/var/<name_of_siteaccess>/storage/" directories). If there are any content images and binary files left, remove them manually or by using the following command from the root of your eZ Publish installation:

php bin/php/ezcache.php --clear-all --purge

"php" should be replaced by the complete path to your php executable.

After that you should also clear Symfony's cache, by executing the following shell command from your eZ Publish root folder:

php ezpublish/console cache:clear --env=prod

On eZ Publish 5.1 and higher:

Only the Symfony's console cache:clear command needs to be executed, from your eZ Publish root folder:

php ezpublish/console cache:clear --env=prod

Or use Symfony's console to run the ezcache.php legacy script:

php ezpublish/console --env=prod ezpublish:legacy:script bin/php/ezcache.php --clear-all --purge

Note that "php" should be replaced by the path to your php executable.

If you configured multiple servers, execute the command from each server node in order to clear the local caches for each one.

Note

The "clusterize.php" file mentioned in step "5. Import files to database" can also be used with a "-r" option. This will automatically remove the imported files after they have been clusterized. Using it will make this step "9. Remove the imported files from the file system" obsolete. But keep in mind that using the "-r" option is some what advanced so use with caution.

 Note: known issue

When using a database based file handler (eZ DB or eZ DFS) the following bug will occur if all of the conditions listed here are true:

  •  You use MySQL
  •  You use different databases for the content and cluster tables
  •  You use the same host, port, user name and password for both databases
  •  The port is explicitly specified in both site.ini and file.ini.

The bug is that eZ Publish will look for content tables in the cluster database, which means that all page requests will fail.
 Although a solution has been proposed, it has not yet been approved at the time of this writing. So for the moment the quickest workaround is to use different user names for the two databases.

For more information regarding this issue, please visit http://issues.ez.no/13927

Limitation on some file systems when storing large number of content files

eZ Publish stores all disc related content (eg Images, PDF's etc) in var/storage like the structure from content tree, creating one folder for each object. In most file systems used under Linux (especially ext2 + ext3) there exists a hard LIMIT TO 32000 directories per folder. So it is not possible to store more as 31999 objects under one folder.

To get around this limitation without changing the file system, you can split your content tree so that you don't have more than 32k content files (example: images) in the same folder.
Examples of file systems that supports more file/folder entries per folder.

  • ReiserFS: roughly 1.2 million per directory
  • ZFS: 2^48 (a really big number: 281474976710656)!

Performance issues on cache generation

The MaxCopyRetries setting has been introduced in order to solve cache generation issues on low performance conditions.
This relates to a new feature that that retries to generate the cache file in the case of failure. There will be considered as much retries as defined in the MaxCopyRetries setting.
By default it comes set to 5. Increase this value if you find cache files like expiryXYZtmp.php with 0 byte size.

For more details please refer to the MaxCopyRetries setting documentation.

Character encoding and filenames

Also please make sure to configure the SystemLocale setting with the correct language, in order to avoid issues when uploading files with special characters, or with characters of a different encode.
Here's a configuration example:

[RegionalSettings]
SystemLocale=fr_FR.UTF-8

Please refer to Jira Issue EZP-20966, for more details on this subject.

Using a custom FS backend

Starting from eZ Publish 5.4 / 2014.07, the FS backend, used by eZDFS to read/write binary files, can be configured. Details can be found in the Configurable DFS backend feature doc.

Ester Heylen (14/09/2010 12:35 pm)

Bertrand Dunogier (25/06/2014 3:36 pm)

Jérôme Vieilledent, Ricardo Correia, Bertrand Dunogier


Comments

There are no comments.