URL transformation rules

When a site administrator enters a value for a new virtual URL, the system will perform cleanup of the input by using so-called URL transformation rules. This is done in order to avoid problems with certain characters and to ensure that the alias conforms the standards and the other URLs of the site. If an inputted alias is modified, the user will be notified.

Note that in eZ Publish 3.10, the transformation of entered/generated aliases has changed.

Unicode support

In versions prior to 3.10, URL transformation rules were more restrictive and only supported some ASCII characters (lowercase Latin letters from "a" to "z", digits and underscores). This caused problems for many non-western languages that use different alphabets, some of them which are difficult to transliterate.

From eZ Publish 3.10, it is possible to enable Unicode support for the URLs and thus no transliteration needs to be performed since most characters are allowed. The following characters are not allowed: ampersand, semi-colon, forward slash, colon, equal sign, question mark, square brackets, parenthesis and the plus sign. Note that spaces are only allowed as word separators. These characters are not allowed in order to avoid miscellaneous problems (related to the HTTP protocol).

The Unicode characters are encoded using the IRI standard. The text is encoded using UTF-8 before further encoding is performed. The resulting URL will contain characters that are compatible with the HTTP protocol and which will work in all existing browsers/clients. Note that modern browsers will decode the URL and display the characters using Unicode.

Dash/underscore/space

In versions prior to 3.10, only underscores were allowed as separators of words. From 3.10, it is possible to choose which word separator that should be used. This can be done by changing the value of the "WordSeparator" configuration directive located in the [URLTranslator] section of an override for "site.ini". It can be set to either "dash", "underscore" or "space". Note that this setting will be ignored when the "urlalias_compat" transformation method is used (since it only supports underscores as separators).

Case sensitivity

When the "urlalias" or "urlalias_iri" transformation method is used, the URLs will consist of mixed cases (uppercase and lowercase characters). This is different from the traditional/old behavior where every letter was converted to lowercase. Instead, the system will preserve the cases and store the URL aliases accordingly. However, the URLs themselves will not be case sensitive. For example, the URL alias for a node called "About Us" will be "About-Us" (assuming that the word separator is a dash). The "About Us" node will be accessible regardless of how the URL is specified when it comes to lowercase and uppercase letters. In other words, the node will be accessible through all of the following URLs: "www.example.com/about-us", "www.example.com/About-us", "www.example.com/ABOUT-US"; and so on.

Note that if there are two nodes with (almost) identical names within the same location (for example "My article" and "My Article" inside a folder called "News"), the system will generate unique URL aliases for newly introduced conflicting nodes by attaching numbers to their URL aliases. For example, if a node called "My article" already exits and "My Article" is created at the same location, the URL alias of the second ("My Article") node will be "My-Article2". If a third "MY Article" node is introduced, it's URL alias will be "MY-Article3"; and so on.

Alias text filtering

Support for filtering was implemented in order to introduce more flexibility when it comes to the generation of the aliases. The filters are performed by the system on the URLs before the result is transformed to a valid alias. The filters can be created as extensions. The following text explains how to create a new filter.

Refer to the "[URLTranslator]" section of the "site.ini" for more information about the "Filters" setting.

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2013 eZ Systems AS (except where otherwise noted). All rights reserved.