Clustering allows eXo Platfrom users to run various portal instances on several parallel servers which are also called nodes. The load is distributed across different servers, so the portal is still accessible via other cluster nodes in case of any failed servers. Thanks to adding more nodes to the cluster, eXo Platform's performance can be much improved. In this chapter, you will see the following topics:
Information and characteristics of eXo Platform cluster, and steps to do a typical setup.
Common questions and answers that are useful for administrators when doing a clustering on eXo Platform.
A cluster is a set of nodes which are managed together and participate in the workload management. Installing eXo Platform in the cluster mode is considered in the following main cases:
Load Balancing: when a single server node is not enough for handling the load.
High Availability: if one of nodes is failed, the rest of nodes in the cluster can assume the workload of the system. It means that no access is interrupted.
These characteristics should be handled by the overall architecture of your system. The Load Balancing is typically achieved by a front server or device that distributes the request to the cluster nodes. Also, the High Availability on the data layer can be typically achieved using the native replication implemented by Relation Database Management System (RDBMS) or Shared File Systems, such as SAN and NAS.
Introduction to the shared file system which is necesary for the embedded JCR server.
Steps to do a typical setup of eXo Platform cluster.
Introduction to the advanced configuration which is optional and not required in general cases, including JBossCache and shared file system.
Description of changes which will be applied if the cluster is used with the local JCR index on each node.
See also
In eXo Platform, the persistence mostly relies on JCR, which is a middleware between the eXo Platform applications (including the Portal) and the database. Hence, this component must be configured to work in the cluster mode.
The embedded JCR server requires a portion of its state to be shared on a file system shared among the cluster nodes:
The values storage.
The index (in case of shared index usage).
Since Platform 3.5, a local JCR index can be used on each node of the cluster. It is a new feature and it needs a special configuration in Platform (described below).
All nodes must have the read/write access to the shared file system.
It is strongly recommended that you use a mount point on a SAN.
The following steps describe a typical setup of eXo Platform cluster:
Switch to a cluster configuration. This step is done in the configuration.properties file. This configuration.properties file must be set in the same way on all the cluster nodes. First, point the exo.shared.dir variable to a directory shared between cluster nodes.
exo.shared.dir=/PATH/TO/SHARED/FS
The path is shared, so all nodes will need the read/write access to this path. Then, switch the JCR to the cluster mode.
gatein.jcr.config.type=cluster
In this step, JCR enables the automatic network replication and discovery between other cluster nodes.
Switch to the cluster profile. You need to indicate the cluster kernel profile to eXo Platform. This can be done by editing the startup script in the bin/gatein.sh folder as below:
EXO_PROFILES="-Dexo.profiles=default,cluster"
or use the start_eXo.sh script with such parameters:
./start_eXo.sh default,cluster
Do the initial startup. For the initial startup of your JCR cluster, you should only start a single node. This node will initialize the internal JCR database and create the system workspace. Once the initial node is definitely started, you can start the other nodes.
This constraint is only for the initial start. As above, you can start the cluster in any order, but it should be started fully from the single node. After that, others can start in any order or in parallel.
Start up and shut down.
It is recommended that you always start the cluster from a single node, as on initial startup, and then start all others in any order or in parallel. Nodes of the cluster will automatically try to join others at startup. Once they have discovered each other, they will synchronize their state. During the synchronization, the node is not ready to serve requests.
The cluster mode is preconfigured to work out of the box. The eXo Platform clustering fully relies on the JBossCache replication which uses JGroups internally. The default configuration of JBossCache lies in exo.portal.component.common-x.x.x.jar. Since eXo Platform 3.5, the JCR's JBossCache configuration is externalized to the gatein.conf.dir configuration folder:
jcr - folder with cache configuration for JCR
cache - folder with cache configuration for eXo Cache
idm - folder with cache configuration for PicketLink IDM organization service
jgroups - folder with JGroups configuration used in all caches
The advanced configuration is optional and is not required in general cases. It is recommended to do an advanced configuration only in case of a need.
The JBossCache configuration is done in the configuration.properties file using following properties:
# JCR cache configuration
gatein.jcr.cache.config=file:${gatein.conf.dir}/jcr/jbosscache/${gatein.jcr.config.type}/cache-config.xml
gatein.jcr.cache.expiration.time=15m
# JCR Locks configuration
gatein.jcr.lock.cache.config=file:${gatein.conf.dir}/jcr/jbosscache/${gatein.jcr.config.type}/lock-config.xml
# JCR Index configuration
gatein.jcr.index.cache.config=file:${gatein.conf.dir}/jcr/jbosscache/${gatein.jcr.config.type}/indexer-config.xml
# JGroups configuration
gatein.jgroups.jmxstatistics.enable=true
# for eXo Cache and IDM org-service (in cluster cache-config.xml files)
gatein.jgroups.config=${gatein.conf.dir}/jgroups/jgroups-udp.xml
# for JCR
gatein.jcr.jgroups.config=file:${gatein.jgroups.config}By default, the nodes discovery is based on UDP, in which JGroups is responsible for the nodes identification through the UDP transport. The administrator can change the configuration of detection and ports in jgroups-udp.xml.
Optionally, if you need separate physical storage for JCR indexes and value storage files, it is possible to configure related paths, each to a separate shared file system:
gatein.jcr.storage.data.dir=/PATH/TO/SHARED/VALUES_FS/values gatein.jcr.index.data.dir=/PATH/TO/SHARED/INDEX_FS/index
JCR clustering with local index on each node is a new feature. Find more information about Indexing in clustered environment in JCR reference documentation.
If the cluster is used with the local JCR index on each node, apply the following changes to the steps described above:
configure index data to a local directory on each node:
gatein.jcr.index.data.dir=/PATH/TO/LOCAL/INDEX
run the cluster with the additional profile named "cluster-index-local" by adding the following profile to the startup script in the bin/gatein.sh file;
EXO_PROFILES="-Dexo.profiles=default,cluster,cluster-index-local"
Or, by using the following command with this additional profile.
./start_eXo.sh default,cluster,cluster-index-local
Q1. How to migrate from local to the cluster mode?
If you intend to migrate your production system from the local (non-cluster) to the cluster mode, follow these steps:
1. Update the configuration to the cluster mode as explained above on your main server.
2. Use the same configuration on other cluster nodes.
3. Move the index and value storage to the shared file system.
4. Start the cluster.
Q2. Why is startup failed with the "Port value out of range" error?
On Linux, your startup is failed if you encounter the following error:
[INFO] Caused by: java.lang.IllegalArgumentException: Port value out of range: 65536
This problem happens under specific circumstances when the JGroups networking library behind the clustering attempts to detect the IP to communicate with other nodes.
You need to verify:
The host name is a valid IP address, served by one of the network devices, such as eth0, and eth1.
The host name is NOT defined as localhost or 127.0.0.1.
Q3. How to solve the "failed sending message to null" error?
If you encounter the following error when starting up in the cluster mode on Linux:
Dec 15, 2010 6:11:31 PM org.jgroups.protocols.TP down SEVERE: failed sending message to null (44 bytes) java.lang.Exception: dest=/228.10.10.10:45588 (47 bytes)
Be aware that clustering on Linux only works with IPv4. Therefore, when using a cluster under Linux, add the following property to the JVM parameters:
-Djava.net.preferIPv4Stack=true
Q3. How to hide JGroups protocol warnings in the log?
In the cluster mode, several eXo Platform subsystems, such as JCR, various caches, and organization service use shared JGroups transport; and in case of being used by the default UDP transport, it might cause a side effect (a lot of warnings) like these below:
WARNING: discarded message from different group "gatein-idm-api-cluster" (our group is "gatein-idm-store-cluster"). Sender was 192.168.1.55:54232 Dec 16, 2011 4:46:09 PM org.jgroups.protocols.TP passMessageUp WARNING: discarded message from different group "gatein-idm-store-cluster" (our group is "gatein-idm-api-cluster"). Sender was 192.168.1.55:63364 Dec 16, 2011 4:46:10 PM org.jgroups.protocols.TP passMessageUp
To hide such warnings, need to configure the Application Server logger in an appropriate way:
Apache Tomcat: add the following lines to the ${CATALINA_HOME}/conf/logging.properties file:
org.jgroups.level = SEVERE
org.jgroups.handlers = java.util.logging.ConsoleHandler,6gatein.org.apache.juli.FileHandler
JBoss Application Server: for the "all" server profile, add the following lines to the ${jboss_server}/server/all/conf/jboss-log4j.xml file:
<category name="org.jgroups.protocols.UDP">
<priority value="ERROR"/>
</category>
See also