This chapter covers the following topics:
Clustering allows eXo Platfrom users to run various portal instances on several parallel servers which are also called nodes. The load is distributed across different servers, so the portal is still accessible via other cluster nodes in case of any failed servers. Thanks to adding more nodes to the cluster, eXo Platform's performance can be much improved. A cluster is a set of nodes which is managed together and participate in the workload management. Installing eXo Platform in the cluster mode is considered in the following main cases:
Load Balancing: when a single server node is not enough for handling the load.
High Availability: if one of nodes is failed, the rest of nodes in the cluster can assume the workload of the system. It means that no access is interrupted.
These characteristics should be handled by the overall architecture of your system. The Load Balancing is typically achieved by a front server or device that distributes the request to the cluster nodes. Also, the High Availability on the data layer can be typically achieved using the native replication implemented by Relation Database Management System (RDBMS) or Shared File Systems, such as SAN and NAS.
In eXo Platform, the persistence mostly relies on JCR, which is a middleware between the eXo Platform applications (including the Portal) and the database. Hence, this component must be configured to work in the cluster mode.
The embedded JCR server requires a portion of its state to be shared on a file system shared among the cluster nodes:
The values storage.
The index (in case of shared index usage).
Since Platform 3.5, a local JCR index can be used on each node of the cluster. It is a new feature and it needs a special configuration in Platform (described below).
All nodes must have the read/write access to the shared file system.
It is strongly recommended that you use a mount point on a SAN.
The following steps describe a typical setup of Platform cluster:
Switch to a cluster configuration.
This step is done in the configuration.properties file. This configuration.properties file must be set in the same way on all the cluster nodes. First, point the exo.shared.dir variable to a directory shared between cluster nodes.
exo.shared.dir=/PATH/TO/SHARED/FS
The path is shared, so all nodes will need the read/write access to this path. Then, switch the JCR to the cluster mode.
gatein.jcr.config.type=cluster
In this step, JCR enables the automatic network replication and discovery between other cluster nodes.
Switch to the cluster profile.
You need to indicate the cluster kernel profile to eXo Platform. This can be done by editing the startup script in the bin/gatein.sh folder as below:
EXO_PROFILES="-Dexo.profiles=default,cluster"
or use the start_eXo.sh script with such parameters:
./start_eXo.sh default,cluster
Do the initial startup.
For the initial startup of your JCR cluster, you should only start a single node. This node will initialize the internal JCR database and create the system workspace. Once the initial node is definitely started, you can start the other nodes.
This constraint is only for the initial start. As above, you can start the cluster in any order, but it should be started fully from the single node. After that, others can start in any order or in parallel.
Start up and shut down.
Always start the cluster from a single node, as on initial startup, and then start all others in any order or in parallel. Nodes of the cluster will automatically try to join others at startup. Once they have discovered each other, they will synchronize their state. During the synchronization, the node is not ready to serve requests.
The cluster mode is preconfigured to work out of the box. The eXo Platform clustering fully relies on the JBossCache replication which uses JGroups internally. The default configuration of JBossCache lies in exo.portal.component.common-x.x.x.jar. Since eXo Platform 3.5, the JCR's JBossCache configuration is externalized to the gatein.conf.dir configuration folder:
jcr - folder with cache configuration for JCR
cache - folder with cache configuration for eXo Cache
idm - folder with cache configuration for PicketLink IDM organization service
jgroups - folder with JGroups configuration used in all caches
The advanced configuration is optional and is not required in general cases. It is recommended to do an advanced configuration only in case of a need.
The JBossCache configuration is done in the configuration.properties file using following properties:
# JCR cache configuration
gatein.jcr.cache.config=file:${gatein.conf.dir}/jcr/jbosscache/${gatein.jcr.config.type}/cache-config.xml
gatein.jcr.cache.expiration.time=15m
# JCR Locks configuration
gatein.jcr.lock.cache.config=file:${gatein.conf.dir}/jcr/jbosscache/${gatein.jcr.config.type}/lock-config.xml
# JCR Index configuration
gatein.jcr.index.cache.config=file:${gatein.conf.dir}/jcr/jbosscache/${gatein.jcr.config.type}/indexer-config.xml
# JGroups configuration
gatein.jgroups.jmxstatistics.enable=true
# for eXo Cache and IDM org-service (in cluster cache-config.xml files)
gatein.jgroups.config=${gatein.conf.dir}/jgroups/jgroups-udp.xml
# for JCR
gatein.jcr.jgroups.config=file:${gatein.jgroups.config}
By default, the nodes discovery is based on UDP, in which JGroups is responsible for the nodes identification through the UDP transport. The administrator can change the configuration of detection and ports in jgroups-udp.xml.
Optionally, if you need separate physical storage for JCR indexes and value storage files, it is possible to configure related paths, each to a separate shared file system:
gatein.jcr.storage.data.dir=/PATH/TO/SHARED/VALUES_FS/values gatein.jcr.index.data.dir=/PATH/TO/SHARED/INDEX_FS/index
JCR clustering with local index on each node is a new feature. Find more information about Indexing in clustered environment in JCR reference documentation.
If the cluster is used with the local JCR index on each node, apply the following changes to the steps described above:
configure index data to a local directory on each node:
gatein.jcr.index.data.dir=/PATH/TO/LOCAL/INDEX
run the cluster with the additional profile named "cluster-index-local" by adding the following profile to the startup script in the bin/gatein.sh file;
EXO_PROFILES="-Dexo.profiles=default,cluster,cluster-index-local"
Or, by using the following command with this additional profile.
./start_eXo.sh default,cluster,cluster-index-local
Q1. How to migrate from local to the cluster mode?
If you intend to migrate your production system from local (non-cluster) to the cluster mode, follow these steps:
1. Update the configuration to the cluster mode as explained above on your main server.
2. Use the same configuration on other cluster nodes.
3. Move the index and value storage to the shared file system.
4. Start the cluster.
Q2. Why is startup failed with the "Port value out of range" error?
On Linux, your startup is failed if you encounter the following error:
[INFO] Caused by: java.lang.IllegalArgumentException: Port value out of range: 65536
This problem happens under specific circumstances when JGroups-the networking library behind the clustering attempts to detect the IP to use for communication with other nodes.
You need to verify:
The host name is a valid IP address, served by one of the network devices, such as eth0, eth1.
The host name is NOT defined as localhost or 127.0.0.1.
Q3. How to solve the "failed sending message to null" error?
If you encounter the following error when starting up in the cluster mode on Linux:
Dec 15, 2010 6:11:31 PM org.jgroups.protocols.TP down SEVERE: failed sending message to null (44 bytes) java.lang.Exception: dest=/228.10.10.10:45588 (47 bytes)
Be aware that clustering on Linux only works with IPv4. Therefore, when using a cluster under Linux, add the following property to JVM parameters:
-Djava.net.preferIPv4Stack=true