Caution: This documentation is for eZ Publish legacy, from version 3.x to 6.x.
For 5.x documentation covering Platform see eZ Documentation Center, for difference between legacy and Platform see 5.x Architecture overview.

URL storage

Every address that is input as a link into an attribute using the "XML block" or the "URL" datatype is stored in a separate part of the database. Actual data stored using these datatypes only contain references to entries in the separate URL table. This feature makes it possible to inspect and edit the published URLs without having to interact with the content objects. The addresses in the URL table can be checked by running the "linkcheck.php" script (which is also executed by the cronjob script) that comes with eZ Publish. This script will simply check if the links in the table actually work by accessing them one by one. If the target server of an URL returns an invalid response (404 Page not found, 500 Internal Server Error, 403 Access Denied, etc.) or if there is simply no response, the URL will be marked invalid.
Keep in mind that if an URL is marked as invalid by this cronjob, the has_content attribute for the matching attribute will return FALSE. The has_content attribute normally only returns FALSE if the attribute has no content.
Invalid URLs and the objects that are using them can be easily filtered out and edited using the "URL management" part of the administration interface. An entry in the URL table consists of the following data:

  • ID
  • Address
  • Creation time
  • Modification time
  • Last checked
  • Status

Every URL has a unique identification number. The address contains the actual link. The creation time is the exact date/time when the object containing that URL was published. The modification time is updated every time the URL is changed using the URL management part of the administration interface (and not when the object containing that URL is edited). Whenever a URL is checked by the script, the last checked field will be updated. The status of a URL can be either valid or invalid. By default, all URLs are valid. When the cronjob script is running, it will automatically update the status of the URLs. If a broken link is found, its status will be set to "invalid". Whenever an already existing URL is stored, the system will simply reuse the existing entry in the table.

Please note that the link check script must be able to contact the outside world through port 80. In other words, the firewall must be opened for outgoing HTTP traffic from the web server that is running eZ Publish.

Balazs Halasy (14/09/2010 10:56 am)

Geir Arne Waaler (28/09/2010 8:51 am)


Comments

There are no comments.