Clustering / Features / 4.x / Technical manual / eZ Publish / Documentation

This part of the 4.x documentation is for eZ Publish 4.0, only reference section is common for all eZ Publish 4.x versions as well as eZ Publish 5.x "LegacyStack", please select the version you are using for the most up to date documentation!

The clustering feature makes it possible to run an eZ Publish site on several web servers. A site that is running on a cluster of servers will have better performance and will be able to handle more traffic.

Before eZ Publish clustering was implemented, the only way to support multiple servers was to store all cache files and images locally on separate file systems (one for each web server) and use "rsync" or NFS to synchronize caches & binary files. This was far from perfect, and induced many limitations. Instead, you can configure the system to store all content related caches, images and binary files in the database. This ensures that all the cluster nodes use the same cache files and have access to the same images and binary files. In other words, when content is updated, changes are automatically and instantly made available for every web servers in the cluster.

Supported database types

The clustering code is optimized for MySQL databases and requires the InnoDB storage engine. This storage engine will be used when creating the database tables needed for clustering. Contact your database administrator if you are unsure about whether InnoDB is available on your server.

Version 1.8 of the eZ Publish Extension for Oracle® Database makes it possible to use Oracle as a database for eZ Publish version 4.0 and later and also includes support for the clustering functionality. Note that the clustering functionality provided by this extension may differ slightly from the generic implementation included in a standard eZ Publish distribution.

Also it is important to keep in mind that the supported databases depend on the cluster file handler that is used:

How it works

Data that must be synchronized between the different servers is stored using the database. However custom templates and design items will not be stored on the database.
The following overview will give an overview of which data is saved where:

Content view cache

When eZ Publish is displaying a page (a content node), it executes the "view" view of the "content" module and include the output in the page layout. If the output is cached, the cache file(s) will be read and served. If not, the system will fetch the content stored in the eZ Publish object database, render the necessary templates, generate a web page and store the resulting XHTML on the file system before serving it. As previously mentioned, these files can be stored in the database and thus the files (along with changes) are easily and immediately available to all servers in the cluster.

Images and image aliases

The approach described above is also used when it comes to images and image aliases (image variations). However, the solution is a bit more complicated because images are usually served directly by the web server (for instance Apache). Since the web server isn't able to communicate with the database, the images need to be served using a PHP script called "index_image.php". This is true for all content images, but not for images that are related to design.

Notes about clearing the caches

Since eZ Publish 3.10 clearing the caches does not lead to the physical removal of cache files when using DB based handlers anymore (since this operation can be quite time consuming). The system will mark the cache files invalid instead of removing them physically from the database or file system. This can be done by either marking each particular cache file expired or setting the global expiry (the latter typically happens when a significant amount of changes is needed, e.g. when clearing all the caches of a specific type). The global expiry is a timestamp that is used as an expiry value for all the caches in the system. If the global expiry is set to a certain date, all cache files that are older than this date will not be used. Note that the system will re-write old/expired cache file entries when re-creating the caches.

In order to physically remove the cache files from the database, the "ezcache.php" script needs to be run with the "--purge" option. The following example shows how to remove the content caches that are more than two days old:

Extra connections in MySQL

The new clustering code available since eZ Publish 3.10 performs an extra connection when writing content to the database. (This connection checks whether the file has been modified since the write lock was acquired; if it has been modified, there is thus no longer a need to write.) Because of this, the maximum number of database connections in MySQL must be increased by 30-50%. If persistent connections are enabled, the cluster code will no longer share connections with normal database calls, so the maximum number of connections previously used will have to be doubled.

Oracle-specific differences

If you use the clustering functionality provided by the eZ Publish Extension for Oracle® Database, note that the system may behave differently from what is described above. If all content related caches are stored in an Oracle database, clearing the caches will always lead to their physical removal; the "ezcache.php" script will also physically remove the cache entries from the database, even when executed without the "--purge" option.

Cluster file handlers

The cluster file handler mechanism makes it possible to store, retrieve, rename, delete, etc. files using the database. The following file handlers are known to the system by default (click on the links for more information):

Note that eZFS and eZFS2 file handlers do not allow actual eZ publish clustering by using multiple servers. Use eZDB and eZDFS for cluster file handling.

Additional HTTP header

Since eZ Publish 3.9 an additional HTTP header called "Served-by" is supported. This feature was added for the purpose of testing and debugging. It is typically useful when you need to check, from the client side, which server handled the request. The following example shows a part of a server response that contains this header:

Note about cluster database schemas

For performance reasons, we require that in production, a different database schema is used for the cluster tables, if applicable. This ensures that transactions from the content database won't create useless contention on the cluster database. Such contention could lead to failures in storing data.

For this reason, eZ Systems does not support such a setup, even though it will technically work fine for developement or testing purpose.