This chapter is divided into 2 main topics that allow you to follow easily:
Instructions on basic configurations related to JCR, persister and JDBC Data Container.
Instructions on advanced configurations regarding to Search, LockManager, QueryHandler, Cluster, RepositoryCreationService, TransactionService and External value storages.
Details of the JCR configuration, including Repository, Workspace, Value storage plugin, Initializer, Cache, Query Handler and Lock Manager.
Instructions on how to configure and customize the JCR persister.
JDBC Data Container configuration
Instructions on how to configure single and multiple database.
Instructions on how to configure single and multiple database.
Like other eXo services, JCR can be configured and used in the portal or embedded mode as a service embedded in eXo Platform.
The JCR service configuration looks like:
<component>
<key>org.exoplatform.services.jcr.RepositoryService</key>
<type>org.exoplatform.services.jcr.impl.RepositoryServiceImpl</type>
</component>
<component>
<key>org.exoplatform.services.jcr.config.RepositoryServiceConfiguration</key>
<type>org.exoplatform.services.jcr.impl.config.RepositoryServiceConfigurationImpl</type>
<init-params>
<value-param>
<name>conf-path</name>
<description>JCR repositories configuration file</description>
<value>war:/conf/jcr/repository-configuration.xml</value>
</value-param>
<value-param>
<name>max-backup-files</name>
<value>5</value>
</value-param>
<properties-param>
<name>working-conf</name>
<description>working-conf</description>
<property name="persister-class-name" value="org.exoplatform.services.jcr.impl.config.JDBCConfigurationPersister" />
<property name="source-name" value="${gatein.jcr.datasource.name}${container.name.suffix}"/>
<property name="dialect" value="${gatein.jcr.datasource.dialect}"/>
</properties-param>
</init-params>
</component>
![]() |
|
![]() |
|
![]() |
|
The JCR Core implementation contains a persister which stores the
repository configuration in the related database using JDBC calls - org.exoplatform.services.jcr.impl.config.JDBCConfigurationPersister.
The implementation will create and use table JCR_CONFIG in the
provided database. But the developer can implement his own persister for his particular
usecase.
In this section you will learn how to configure all basic configurations by configuring the Repository, Workspace, Value storage plugin, Initializer, Cache, Query Handler and Lock Manager of JCR .
The JCR configurations are defined in an
.xml
file. See the following DTD file to understand the expected format of the JCR configuration.
<!ELEMENT repository-service (repositories)>
<!ATTLIST repository-service default-repository NMTOKEN #REQUIRED>
<!ELEMENT repositories (repository)>
<!ELEMENT repository (security-domain,access-control,session-max-age,authentication-policy,workspaces)>
<!ATTLIST repository
default-workspace NMTOKEN #REQUIRED
name NMTOKEN #REQUIRED
system-workspace NMTOKEN #REQUIRED
>
<!ELEMENT security-domain (#PCDATA)>
<!ELEMENT access-control (#PCDATA)>
<!ELEMENT session-max-age (#PCDATA)>
<!ELEMENT authentication-policy (#PCDATA)>
<!ELEMENT workspaces (workspace+)>
<!ELEMENT workspace (container,initializer,cache,query-handler)>
<!ATTLIST workspace name NMTOKEN #REQUIRED>
<!ELEMENT container (properties,value-storages)>
<!ATTLIST container class NMTOKEN #REQUIRED>
<!ELEMENT value-storages (value-storage+)>
<!ELEMENT value-storage (properties,filters)>
<!ATTLIST value-storage class NMTOKEN #REQUIRED>
<!ELEMENT filters (filter+)>
<!ELEMENT filter EMPTY>
<!ATTLIST filter property-type NMTOKEN #REQUIRED>
<!ELEMENT initializer (properties)>
<!ATTLIST initializer class NMTOKEN #REQUIRED>
<!ELEMENT cache (properties)>
<!ATTLIST cache
enabled NMTOKEN #REQUIRED
class NMTOKEN #REQUIRED
>
<!ELEMENT query-handler (properties)>
<!ATTLIST query-handler class NMTOKEN #REQUIRED>
<!ELEMENT access-manager (properties)>
<!ATTLIST access-manager class NMTOKEN #REQUIRED>
<!ELEMENT lock-manager (time-out,persister)>
<!ELEMENT time-out (#PCDATA)>
<!ELEMENT persister (properties)>
<!ELEMENT properties (property+)>
<!ELEMENT property EMPTY>
The elements in the above configuration file are detailed in the following sections:
JCR Service can use multiple Repositories and each repository can have multiple Workspaces.
Repositories configuration parameters support human-readable formats of values. They are all case-insensitive:
Number formats: K, KB - kilobytes; M, MB - megabytes; G, GB - gigabytes; T,TB - terabytes. For example, 100.5 - digit 100.5, 200k - 200 Kbytes, 4m - 4 Mbytes, 1.4G - 1.4 Gbytes, 10T - 10 Tbytes.
Time format endings: ms - milliseconds; m - minutes; h - hours; d - days; w - weeks.
No ending - seconds. Examples: 500ms - 500 milliseconds; 20 - 20 seconds; 30m - 30 minutes; 12h - 12 hours; 5d - 5 days; 4w - 4 weeks.
Repository service configuration
The service configuration is located at platform-extension/WEB-INF/conf/jcr/platform-extension/repository-configuration.xml in the portal web application.
default-repository: The name of a
default repository (one returned by RepositoryService.getRepository()).
repositories: The list of repositories.
name: The name of a
repository.
default-workspace: The name of a
workspace obtained using Session's login() or login(Credentials) methods
(ones without an explicit workspace name).
system-workspace: The name of
workspace where /jcr:system node is
placed.
security-domain: The name of a
security domain for JAAS authentication.
access-control: The name of an
access control policy. There may be 3 types:
optional - ACL is created on demand (default).
disable - No access control.
mandatory - An ACL is created for each added node (not supported yet).
authentication-policy: The name of an authentication policy class.
workspaces: The list of
workspaces.
session-max-age: The time after
which an idle session will be removed (called logout). If session-max-age
is not set up, idle session will never be removed.
lock-remover-max-threads: Number
of threads that
can serve LockRemover tasks. The default value is "1". A repository may have many
workspaces, each workspace have own LockManager. JCR supports Locks with
defined lifetime and these locks removed as it becomes expired by LockRemovers. However,
LockRemovers
is not an independent
timer-thread, it is a task that executes each 30 seconds. Such a task is served by
ThreadPoolExecutor
which may use various threads.
See RepositoryCreationService if you want to learn how to create repositories in runtime.
The service configuration is located at repository-configuration.xml in the web application.
The workspace configuration can be found in different files:
webapps/platform-extension/WEB-INF/conf/jcr/platform-extension/ide-repository-configuration.xml
webapps/portal/WEB-INF/conf/jcr/repository-configuration.xml
webapps/social-extension/WEB-INF/conf/jcr/repository-configuration.xml
webapps/ks-extension/WEB-INF/conf/ks-extension/jcr/repository-configuration.xml
webapps/ecm-wcm-extension/WEB-INF/conf/dms-extension/jcr/repository-extension.xml
webapps/ecm-wcm-extension/WEB-INF/conf/wcm-extension/jcr/repository-extension.xml
webapps/cs-extension/WEB-INF/conf/cs-extension/jcr/repository-extension.xml
name: The name of a
workspace.
auto-init-root-nodetype: DEPRECATED.
The node type for root node initialization.
container: Workspace data container (physical storage) configuration.
initializer: Workspace initializer configuration.
cache: Workspace storage cache configuration.
query-handler: Query handler configuration.
auto-init-permissions: DEPRECATED
.Default permissions of the root node. It is
defined as a set of semicolon-delimited permissions containing a group of
space-delimited identities (user, group, etc, see Organization service
documentation for details) and the type of permission. For example, any
read; :/admin read; :/admin add_node;
:/admin set_property; :/admin remove means
that users from group admin have all
permissions and other users have only a 'read' permission.
Workspace data container configuration
class: A workspace data container
class name.
value-storages: The list of value
storage plugins.
properties: The list of properties
(name-value pairs) for the concrete Workspace data container.
trigger-events-for-descendants-on-rename
| Indicate if it is needed to trigger events for descendants on rename or not. This increases the performance of the "rename" operation. However, Observation will not be notified to have the default value as "true". |
lazy-node-iterator-page-size
| Indicate the page size for lazy iterator. Particularly, this property defines the number of nodes which can be retrieved from storage per request. The default value is "100". |
acl-bloomfilter-false-positive-probability | ACL Bloom filters desired false positive probability. Range is between [0..1] and the default value is "0.1d". |
acl-bloomfilter-elements-number | Define the expected number of ACL-elements in the Bloom-filter. Its default value is 1000000. |
Bloom filters are not supported by all the cache implementations so far, only the implementation for infinispan supports it.
The value-storage element is optional. If you do not include it, the values will be stored as BLOBs inside the database.
See External Value Storages for advanced configuration of Value Storage Plugin.
value-storage: Optional value Storage plugin definition.
This configuration is optional.
class: Initializer implementation
class.
properties: The list of properties (name-value pairs) which are supported.
root-nodetype: The node type for
root node initialization.
root-permissions: Default
permissions of the root node. It is defined as a set of
semicolon-delimited permissions containing a group of space-delimited
identities (for example, user and group. See
Organization Service Initializer
for more details), and the type of permission. For example any read;:/admin read;:/admin add_node;:/admin set_property;:/admin remove means that
users from group
admin
have all
permissions and other users have only a 'read' permission.
Configurable initializer adds a capability to override workspace
initial startup procedure (used for Clustering).It also replaces workspace element parameters, including
auto-init-root-nodetype
andauto-init-permissions, with
root-nodetype
and
root-permissions
respectively.
enabled: Define if workspace cache is enabled or not.
class: Cache implementation class. The default value is org.exoplatform.services.jcr.impl.dataflow.persistent.LinkedWorkspaceStorageCacheImpl.
properties: The list of properties
(name-value pairs) for Workspace cache.
max-size: Cache maximum size.
live-time: Cached item live time.
The service configuration is located at
repository-configuration.xml in the web application. This file can be found in various locations.
class: A Query Handler class
name.
properties: The list of properties
(name-value pairs) for a Query Handler (indexDir).
See QueryHandler configuration for advanced configuration of QueryHandler.
The service configuration is located at
repository-configuration.xml in the web application. The file can be found in various locations.
time-out: Time after which the
unused global lock will be removed.
persister: A class for storing lock
information for future use. For example, remove lock after jcr
restart.
path: A lock folder. Each workspace has its own one.
See LockManager configuration for advanced configuration of LockManager.
Also see lock-remover-max-threads.
JCR allows using persister to store configuration. In this section, you will understand how to use and configure JCR persister.
On startup RepositoryServiceConfiguration
component checks if a configuration persister was configured. In that
case, it uses the provided ConfigurationPersister
implementation class to instantiate the persister object.
The configuration file is located in
portal/WEB-INF/conf/jcr/jcr-configuration.xml
in the portal web application.
Configuration with persister:
<component>
<key>org.exoplatform.services.jcr.config.RepositoryServiceConfiguration</key>
<type>org.exoplatform.services.jcr.impl.config.RepositoryServiceConfigurationImpl</type>
<init-params>
<value-param>
<name>conf-path</name>
<description>JCR configuration file</description>
<value>war:/conf/jcr/repository-configuration.xml</value>
</value-param>
<properties-param>
<name>working-conf</name>
<description>working-conf</description>
<property name="persister-class-name" value="org.exoplatform.services.jcr.impl.config.JDBCConfigurationPersister" />
<property name="source-name" value="${gatein.jcr.datasource.name}${container.name.suffix}"/>
<property name="dialect" value="${gatein.jcr.datasource.dialect}"/>
</properties-param>
</init-params>
</component>
![]() |
|
![]() |
|
![]() |
|
If you want to customize, you can implement ConfigurationPersister interface as follows:
/**
* Init persister.
* Used by RepositoryServiceConfiguration on init.
* @return - config data stream
*/
void init(PropertiesParam params) throws RepositoryConfigurationException;
/**
* Read config data.
* @return - config data stream
*/
InputStream read() throws RepositoryConfigurationException;
/**
* Create table, write data.
* @param confData - config data stream
*/
void write(InputStream confData) throws RepositoryConfigurationException;
/**
* Tell if the config exists.
* @return - flag
*/
boolean hasConfig() throws RepositoryConfigurationException;
The current configuration of JCR uses Apache DBCP connection pool
(org.apache.commons.dbcp.BasicDataSourceFactory).
It is possible to set a big value for maxActive parameter in
configuration.xml. That means lots of TCP/IP
ports from a client machine inside the pool are used, such as JDBC driver. As the
result, the data container can throw exceptions like "Address already in
use". To solve this problem, you have to configure the client's machine
networking software for using shorter timeouts for opened TCP/IP
ports.
Microsoft Windows has MaxUserPort,
TcpTimedWaitDelay registry keys in the node
HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesTcpipParameters,
by default these keys are unset. Set each one with values as follows:
"TcpTimedWaitDelay"=dword:0000001e, sets TIME_WAIT parameter to 30 seconds (default value is "240").
"MaxUserPort"=dword:00001b58, sets the maximum of open ports to 7000 or higher (default value is "5000").
A sample registry file is below:
Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters] "MaxUserPort"=dword:00001b58 "TcpTimedWaitDelay"=dword:0000001e
You need to configure each workspace in a repository. You may have each one on different remote servers as far as you need.
First of all, configure the data containers in the
org.exoplatform.services.naming.InitialContextInitializer
service. It is the JNDI context initializer which registers (binds) naming
resources (DataSources) for data containers.
For example, the configuration for two data containers
(jdbcjcr - local HSQLDB,
jdbcjcr1 - remote MySQL) is as follows :
<component>
<key>org.exoplatform.services.naming.InitialContextInitializer</key>
<type>org.exoplatform.services.naming.InitialContextInitializer</type>
<component-plugins>
<component-plugin>
<name>bind.datasource</name>
<set-method>addPlugin</set-method>
<type>org.exoplatform.services.naming.BindReferencePlugin</type>
<init-params>
<value-param>
<name>bind-name</name>
<value>jdbcjcr</value>
</value-param>
<value-param>
<name>class-name</name>
<value>javax.sql.DataSource</value>
</value-param>
<value-param>
<name>factory</name>
<value>org.apache.commons.dbcp.BasicDataSourceFactory</value>
</value-param>
<properties-param>
<name>ref-addresses</name>
<description>ref-addresses</description>
<property name="driverClassName" value="org.hsqldb.jdbcDriver"/>
<property name="url" value="jdbc:hsqldb:file:target/temp/data/portal"/>
<property name="username" value="sa"/>
<property name="password" value=""/>
</properties-param>
</init-params>
</component-plugin>
<component-plugin>
<name>bind.datasource</name>
<set-method>addPlugin</set-method>
<type>org.exoplatform.services.naming.BindReferencePlugin</type>
<init-params>
<value-param>
<name>bind-name</name>
<value>jdbcjcr1</value>
</value-param>
<value-param>
<name>class-name</name>
<value>javax.sql.DataSource</value>
</value-param>
<value-param>
<name>factory</name>
<value>org.apache.commons.dbcp.BasicDataSourceFactory</value>
</value-param>
<properties-param>
<name>ref-addresses</name>
<description>ref-addresses</description>
<property name="driverClassName" value="com.mysql.jdbc.Driver"/>
<property name="url" value="jdbc:mysql://exoua.dnsalias.net/jcr"/>
<property name="username" value="exoadmin"/>
<property name="password" value="exo12321"/>
<property name="maxActive" value="50"/>
<property name="maxIdle" value="5"/>
<property name="initialSize" value="5"/>
</properties-param>
</init-params>
</component-plugin>
<component-plugins>
<init-params>
<value-param>
<name>default-context-factory</name>
<value>org.exoplatform.services.naming.SimpleContextFactory</value>
</value-param>
</init-params>
</component>
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
There are also some other connection pool configuration parameters (org.apache.commons.dbcp.BasicDataSourceFactory) according to Apache DBCP configuration. |
When the data container configuration is done, you can configure the repository service. Each workspace will be configured for its own data container.
For example (two workspaces ws - jdbcjcr,
ws1 - jdbcjcr1):
<workspaces>
<workspace name="ws" auto-init-root-nodetype="nt:unstructured">
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
<properties>
<property name="source-name" value="jdbcjcr"/>
<property name="dialect" value="hsqldb"/>
<property name="multi-db" value="true"/>
<property name="max-buffer-size" value="200K"/>
<property name="swap-directory" value="target/temp/swap/ws"/>
</properties>
</container>
<cache enabled="true">
<properties>
<property name="max-size" value="10K"/><!-- 10Kbytes -->
<property name="live-time" value="30m"/><!-- 30 min -->
</properties>
</cache>
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
<properties>
<property name="index-dir" value="target/temp/index"/>
</properties>
</query-handler>
<lock-manager>
<time-out>15m</time-out><!-- 15 min -->
<persister class="org.exoplatform.services.jcr.impl.core.lock.FileSystemLockPersister">
<properties>
<property name="path" value="target/temp/lock/ws"/>
</properties>
</persister>
</lock-manager>
</workspace>
<workspace name="ws1" auto-init-root-nodetype="nt:unstructured">
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
<properties>
<property name="source-name" value="jdbcjcr1"/>
<property name="dialect" value="mysql"/>
<property name="multi-db" value="true"/>
<property name="max-buffer-size" value="200K"/>
<property name="swap-directory" value="target/temp/swap/ws1"/>
</properties>
</container>
<cache enabled="true">
<properties>
<property name="max-size" value="10K"/>
<property name="live-time" value="5m"/>
</properties>
</cache>
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
<properties>
<property name="index-dir" value="target/temp/index"/>
</properties>
</query-handler>
<lock-manager>
<time-out>15m</time-out><!-- 15 min -->
<persister class="org.exoplatform.services.jcr.impl.core.lock.FileSystemLockPersister">
<properties>
<property name="path" value="target/temp/lock/ws1"/>
</properties>
</persister>
</lock-manager>
</workspace>
</workspaces>
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
In this way, you have configured two workspaces which will be persisted in two different database (ws in HSQLDB, ws1 in MySQL).
The repository configuration parameters supports human-readable formats of values (for example: 200K - 200 Kbytes, 30m - 30 minutes, etc)
It is simpler to configure a single-database data container. You have to configure one naming resource.
For example (embedded mode for jdbcjcr data
container):
<external-component-plugins>
<target-component>org.exoplatform.services.naming.InitialContextInitializer</target-component>
<component-plugin>
<name>bind.datasource</name>
<set-method>addPlugin</set-method>
<type>org.exoplatform.services.naming.BindReferencePlugin</type>
<init-params>
<value-param>
<name>bind-name</name>
<value>jdbcjcr</value>
</value-param>
<value-param>
<name>class-name</name>
<value>javax.sql.DataSource</value>
</value-param>
<value-param>
<name>factory</name>
<value>org.apache.commons.dbcp.BasicDataSourceFactory</value>
</value-param>
<properties-param>
<name>ref-addresses</name>
<description>ref-addresses</description>
<property name="driverClassName" value="org.postgresql.Driver"/>
<property name="url" value="jdbc:postgresql://exoua.dnsalias.net/portal"/>
<property name="username" value="exoadmin"/>
<property name="password" value="exo12321"/>
<property name="maxActive" value="50"/>
<property name="maxIdle" value="5"/>
<property name="initialSize" value="5"/>
</properties-param>
</init-params>
</component-plugin>
</external-component-plugins>
And configure repository workspaces in repositories configuration with this one database. Parameter "multi-db" must be switched off (set value "false").
For example: two workspaces ws - jdbcjcr, and
ws1 - jdbcjcr:
<workspaces>
<workspace name="ws" auto-init-root-nodetype="nt:unstructured">
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
<properties>
<property name="source-name" value="jdbcjcr"/>
<property name="dialect" value="pgsql"/>
<property name="multi-db" value="false"/>
<property name="max-buffer-size" value="200K"/>
<property name="swap-directory" value="target/temp/swap/ws"/>
</properties>
</container>
<cache enabled="true">
<properties>
<property name="max-size" value="10K"/>
<property name="live-time" value="30m"/>
</properties>
</cache>
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
<properties>
<property name="index-dir" value="../temp/index"/>
</properties>
</query-handler>
<lock-manager>
<time-out>15m</time-out>
<persister class="org.exoplatform.services.jcr.impl.core.lock.FileSystemLockPersister">
<properties>
<property name="path" value="target/temp/lock/ws"/>
</properties>
</persister>
</lock-manager>
</workspace>
<workspace name="ws1" auto-init-root-nodetype="nt:unstructured">
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
<properties>
<property name="source-name" value="jdbcjcr"/>
<property name="dialect" value="pgsql"/>
<property name="multi-db" value="false"/>
<property name="max-buffer-size" value="200K"/>
<property name="swap-directory" value="target/temp/swap/ws1"/>
</properties>
</container>
<cache enabled="true">
<properties>
<property name="max-size" value="10K"/>
<property name="live-time" value="5m"/>
</properties>
</cache>
<lock-manager>
<time-out>15m</time-out>
<persister class="org.exoplatform.services.jcr.impl.core.lock.FileSystemLockPersister">
<properties>
<property name="path" value="target/temp/lock/ws1"/>
</properties>
</persister>
</lock-manager>
</workspace>
</workspaces>
In this way, you have configured two workspaces which will be persisted in one database (PostgreSQL).
Configuration without DataSource
Repository configuration without using the javax.sql.DataSource bounded in JNDI.
This case may be usable if you have a dedicated JDBC driver implementation with special features like XA transactions, statements/connections pooling and so on:
Remove the configuration in
InitialContextInitializer for your database
and configure a new one directly in the workspace
container.
Remove parameter "source-name" and add next lines instead. Describe your values for a JDBC driver, database URL and username.
Be careful in the case JDBC driver should be implemented and provide connection pooling. Connection pooling is very recommended for using with JCR to prevent a database overload.
<workspace name="ws" auto-init-root-nodetype="nt:unstructured">
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
<properties>
<property name="dialect" value="hsqldb"/>
<property name="driverClassName" value="org.hsqldb.jdbcDriver"/>
<property name="url" value="jdbc:hsqldb:file:target/temp/data/portal"/>
<property name="username" value="su"/>
<property name="password" value=""/>
......
Details of Search configuration, including XML parameters, global search index and indexing tuning.
Instructions on how to configure LockManager which is used to store Lock objects.
Details of Indexing in clustered environment, query-handler parameters, cluster-ready indexing strategies, Asynchronous reindexing and Lucene tuning.
Requirements related to environment and configuration, instructions on how to configure JBoss Cache and stop a node properly in the cluster environment.
Overview of dependencies and how RepositoryCreationService works, details of its configuration and interface.
Details of existing TransactionService implementations and JBoss TransactionService.
Details of Tree File Value Storage, Simple File Value Storage and Content Addressable Value storage support.
Search is an important function in JCR, so it is quite necessary for you to know how to configure the JCR Search tool.
Before going deeper into the JCR Search tool, you need to learn about the .xml configuration file and its parameters as follows.
This is the JCR index configuration under the repository-configuration.xml file which can be found in various locations.
<repository-service default-repository="db1">
<repositories>
<repository name="db1" system-workspace="ws" default-workspace="ws">
....
<workspaces>
<workspace name="ws">
....
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
<properties>
<property name="index-dir" value="${java.io.tmpdir}/temp/index/db1/ws" />
<property name="synonymprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider" />
<property name="synonymprovider-config-path" value="/synonyms.properties" />
<property name="indexing-configuration-path" value="/indexing-configuration.xml" />
<property name="query-class" value="org.exoplatform.services.jcr.impl.core.query.QueryImpl" />
</properties>
</query-handler>
...
</workspace>
</workspaces>
</repository>
</repositories>
</repository-service>
Followings are parameters of JCR index configuration:
| Parameter | Default | Description |
|---|---|---|
index-dir | none | The location of the index directory. This parameter is mandatory. |
use-compoundfile | true | Advise Lucene to use compound files for the index files. |
min-merge-docs | 100 | Minimum number of nodes in an index until segments are merged. |
volatile-idle-time | 3 | Idle time in seconds until the volatile index part is moved to a persistent index even though minMergeDocs is not reached. |
max-merge-docs | Integer.MAX_VALUE | Maximum number of nodes in segments that will be merged. |
merge-factor | 10 | Determine how often segment indices are merged. |
max-field-length | 10000 | The number of words that are fulltext indexed at most per property. |
cache-size | 1000 | Size of the document number cache. This cache maps uuids to Lucene document numbers. |
force-consistencycheck | false | Run a consistency check on every startup. If false, a consistency check is only performed when the search index detects a prior forced shutdown. |
auto-repair | true | Errors detected by a consistency check are automatically repaired. If false, errors are only written to the log. |
query-class | QueryImpl | Class name that implements the javax.jcr.query.Query interface. This class must also extend from the org.exoplatform.services.jcr.impl.core.query.AbstractQueryImpl class. |
document-order | true | If 'true' is set and the query does not contain an 'order by' clause, result nodes will be in 'document order'. For better performance when queries return a lot of nodes, set this parameter to 'false'. |
result-fetch-size | Integer.MAX_VALUE | The number of results when a query is executed. The default
value is Integer.MAX_VALUE. |
excerptprovider-class | DefaultXMLExcerpt | The name of the class that implements
org.exoplatform.services.jcr.impl.core.query.lucene.ExcerptProvider
and should be used for the rep:excerpt() function in a
query. |
support-highlighting | false | If set to true additional information is stored in the
index to support highlighting using the rep:excerpt() function. |
synonymprovider-class | none | The name of a class that implements
org.exoplatform.services.jcr.impl.core.query.lucene.SynonymProvider.
The default value is null (not set). |
synonymprovider-config-path | none | The path to the synonym provider configuration file. This
path is interpreted relatively to the path parameter. If there is a
path element inside the SearchIndex element, then this path is
interpreted and relative to the root path of the path. Whether
this parameter is mandatory or not, it depends on the synonym
provider implementation. The default value is null. |
indexing-configuration-path | none | The path to the indexing configuration file. |
indexing-configuration-class | IndexingConfigurationImpl | The name of the class that implements org.exoplatform.services.jcr.impl.core.query.lucene.IndexingConfiguration. |
force-consistencycheck | false | If "true" is set, a consistency check is performed,
depending on the forceConsistencyCheck parameter. If setting to
false, no consistency check is performed on startup, even if a redo log had been applied. |
spellchecker-class | none | The name of a class that implements org.exoplatform.services.jcr.impl.core.query.lucene.SpellChecker. |
spellchecker-more-popular | true | If "true" is set, spellchecker returns only the suggest words that are as frequent or more frequent than the checked word. If "false" set, spellchecker returns null (if checked word exit in dictionary), or spellchecker will return the most close suggested word. |
spellchecker-min-distance | 0.55f | Minimal distance between checked word and the proposed suggested word. |
errorlog-size | 50(Kb) | The default size of error log file in Kb. |
upgrade-index | false | Allow JCR to convert an existing index into the new
format. You have to run an
automatic migration: Start JCR with -Dupgrade-index=true. The old
index format is then converted in the new index format. After the
conversion, the new format is used. On the next start, you do not
need this option anymore. As the old index is replaced and a back
conversion is not possible, you should take a backup of the
index before. |
analyzer | org.apache.lucene.analysis.standard.StandardAnalyzer | Class name of a lucene analyzer to use for fulltext indexing of text. |
The global search index is configured in the above-mentioned configuration file
(repository-configuration.xml which can be found in various locations)
in the query-handler tag.
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
In fact, when using Lucene, you should always use the same
analyzer for indexing and for querying, otherwise the results are
unpredictable. You do not have to worry about this, JCR does this for
you automatically. If you do not like the StandardAnalyzer to be configured by
default, just replace it with your own.
If you do not have a handy QueryHandler, you can learn how to create a customized QueryHandler in the QueryHandler configuration section.
By default JCR uses the Lucene standard Analyzer to index contents. This analyzer uses some standard filters in the method that analyzes the content:
public TokenStream tokenStream(String fieldName, Reader reader) {
StandardTokenizer tokenStream = new StandardTokenizer(reader, replaceInvalidAcronym);
tokenStream.setMaxTokenLength(maxTokenLength);
TokenStream result = new StandardFilter(tokenStream);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
return result;
}
The first one (StandardFilter) removes 's (as 's in "Peter's") from the end of words and removes dots from acronyms.
The second one (LowerCaseFilter) normalizes token text to lower case.
The last one (StopFilter) removes stop words from a token stream. The stop set is defined in the analyzer.
For specific cases, you may wish to use additional filters like ISOLatin1AccentFilter, which replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalents.
In order to use a different filter, you have to create a new analyzer, and a new search index to use the analyzer. You put it in a jar, which is deployed with your application.
The ISOLatin1AccentFilter is not present in the current Lucene
version used by eXo. You can use the attached file. You can also
create your own filter with the relevant method as follows:
public final Token next(final Token reusableToken) throws java.io.IOException
This method defines how chars are read and used by the filter.
The analyzer has to extend org.apache.lucene.analysis.standard.StandardAnalyzer, and overload the
following method to put your own filters.
public TokenStream tokenStream(String fieldName, Reader reader)
You can have a glance at the example analyzer attached to this article.
Configure Platform to use your analyzer
In repository-configuration.xml which can be found in various locations, you have to add the analyzer parameter to each query-handler config:
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
<properties>
...
<property name="analyzer" value="org.exoplatform.services.jcr.impl.core.MyAnalyzer"/>
...
</properties>
</query-handler>
When you start eXo, your SearchIndex will start to index content with the specified filters.
You have had the analyzer, so you now need to write the SearchIndex,
which will use the analyzer. You have to extend org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex. You
have to write the constructor to set the right analyzer and the following method to return your analyzer.
public Analyzer getAnalyzer() {
return MyAnalyzer;
}
You can see the attached SearchIndex.
You can set Analyzer directly in your configuration. So, creating a new SearchIndex only for new Analyzer is redundant.
Configure Platform to use your SearchIndex
In repository-configuration.xml which can be found in various locations, you have to replace each:
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
with your own class
<query-handler class="mypackage.indexation.MySearchIndex">
Each property of a node (if it is indexable) is processed with Lucene analyzer and stored in Lucene index. That is called indexing of a property. After that, you can perform a fulltext search among these indexed properties.
The sense of analyzers is to transform all strings stored in the index in a well-defined condition. The same analyzer(s) is/are used when searching in order to adapt the query string to the index reality.
Therefore, performing the same query using different analyzers can return different results.
Now, let's see how the same string is transformed by different analyzers.
| Analyzer | Parsed |
|---|---|
| org.apache.lucene.analysis.WhitespaceAnalyzer | [The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] |
| org.apache.lucene.analysis.SimpleAnalyzer | [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] |
| org.apache.lucene.analysis.StopAnalyzer | [quick] [brown] [fox] [jumped] [over] [lazy] [dogs] |
| org.apache.lucene.analysis.standard.StandardAnalyzer | [quick] [brown] [fox] [jumped] [over] [lazy] [dogs] |
| org.apache.lucene.analysis.snowball.SnowballAnalyzer | [quick] [brown] [fox] [jump] [over] [lazi] [dog] |
| org.apache.lucene.analysis.standard.StandardAnalyzer (configured without stop word - JCR default analyzer) | [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] |
| Analyzer | Parsed |
|---|---|
| org.apache.lucene.analysis.WhitespaceAnalyzer | [XY&Z] [Corporation] [-] [xyz@example.com] |
| org.apache.lucene.analysis.SimpleAnalyzer | [xy] [z] [corporation] [xyz] [example] [com] |
| org.apache.lucene.analysis.StopAnalyzer | [xy] [z] [corporation] [xyz] [example] [com] |
| org.apache.lucene.analysis.standard.StandardAnalyzer | [xy&z] [corporation] [xyz@example] [com] |
| org.apache.lucene.analysis.snowball.SnowballAnalyzer | [xy&z] [corpor] [xyz@exampl] [com] |
| org.apache.lucene.analysis.standard.StandardAnalyzer (configured without stop word - jcr default analyzer) | [xy&z] [corporation] [xyz@example] [com] |
StandardAnalyzer is the default analyzer in JCR search engine but it does not use stop words.
You can assign your analyzer as described in Search Configuration.
How are different properties indexed?
Different properties are indexed in different ways that defines if it can be searched like fulltext by property or not.
Only two property types are indexed as fulltext searcheable: STRING and BINARY.
| Property Type | Fulltext search by all properties | Fulltext search by exact property |
|---|---|---|
| STRING | YES | YES |
| BINARY | YES | NO |
For example, you have the jcr:data property (it is BINARY). It is stored
well, but you will never find any string with query like:
SELECT * FROM nt:resource WHERE CONTAINS(jcr:data, 'some string')
BINARY is not searchable by fulltext search on the exact property, but the next query will return result if the node has searched data.
SELECT * FROM nt:resource WHERE CONTAINS( * , 'some string')
Fulltext search query examples
First of all, fill repository by nodes with mixin type
'mix:title' and different values of jcr:description property.
root
document1 (mix:title) jcr:description = "The quick brown fox jumped over the lazy dogs."
document2 (mix:title) jcr:description = "Brown fox live in forest."
document3 (mix:title) jcr:description = "Fox is a nice animal."
Let's see analyzers effect closer. In the first case, the base JCR settings is used, so as mentioned above, the string "The quick brown fox jumped over the lazy dogs" will be transformed to set {[the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] }
// make SQL query
QueryManager queryManager = workspace.getQueryManager();
String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(jcr:description, 'the')";
// create query
Query query = queryManager.createQuery(sqlStatement, Query.SQL);
// execute query and fetch result
QueryResult result = query.execute();
NodeIterator will return "document1".
Now change the default analyzer to org.apache.lucene.analysis.StopAnalyzer. Fill the repository (new
Analyzer must process nodes properties) and run the same query again. It
will return nothing, because stop words like "the" will be excluded from parsed string set.
The default search index implementation in JCR allows you to control which properties of a node are indexed. You also can define different analyzers for different nodes.
The configuration parameter is called indexingConfiguration and its default value is not set. This means all properties of a node are indexed.
If you wish to configure the indexing behavior, you need to add a parameter to the query-handler element in your configuration file.
<property name="indexing-configuration-path" value="/indexing_configuration.xml"/>
Index configuration path can indicate any file located on the file system, in the jar or war files.
You have to declare the namespace prefixes in the configuration element that you are using throughout the .xml file.
To optimize the index size, you can limit the node scope so that only certain properties of a node type are indexed.
With the below configuration, only properties named Text are indexed for nodes of type nt:unstructured. This configuration also applies to all nodes whose type extends from nt:unstructured.
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<index-rule nodeType="nt:unstructured">
<property>Text</property>
</index-rule>
</configuration>
It is also possible to configure a boost value for the nodes that match the index rule. The default boost value is 1.0. Higher boost values (a reasonable range is 1.0 - 5.0) will yield a higher score value and appear as more relevant.
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<index-rule nodeType="nt:unstructured" boost="2.0">
<property>Text</property>
</index-rule>
</configuration>
If you do not wish to boost the complete node but only certain properties, you can also provide a boost value for the listed properties:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<index-rule nodeType="nt:unstructured">
<property boost="3.0">Title</property>
<property boost="1.5">Text</property>
</index-rule>
</configuration>
You may also add a condition to the index rule and have multiple rules with the same nodeType. The first index rule that matches will apply and all remain ones are ignored:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<index-rule nodeType="nt:unstructured"
boost="2.0"
condition="@priority = 'high'">
<property>Text</property>
</index-rule>
<index-rule nodeType="nt:unstructured">
<property>Text</property>
</index-rule>
</configuration>
In the above example, the first rule only applies if the nt:unstructured node has a priority property with a value 'high'. The condition syntax supports only the equals operator and a string literal.
You may also refer properties in the condition that are not on the current node:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<index-rule nodeType="nt:unstructured"
boost="2.0"
condition="ancestor::*/@priority = 'high'">
<property>Text</property>
</index-rule>
<index-rule nodeType="nt:unstructured"
boost="0.5"
condition="parent::foo/@priority = 'low'">
<property>Text</property>
</index-rule>
<index-rule nodeType="nt:unstructured"
boost="1.5"
condition="bar/@priority = 'medium'">
<property>Text</property>
</index-rule>
<index-rule nodeType="nt:unstructured">
<property>Text</property>
</index-rule>
</configuration>
The indexing configuration also allows you to specify the type of a node in the condition. However, please note that the type match must be exact. It does not consider sub-types of the specified node type.
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<index-rule nodeType="nt:unstructured"
boost="2.0"
condition="element(*, nt:unstructured)/@priority = 'high'">
<property>Text</property>
</index-rule>
</configuration>
Exclusion from the node scope index
All configured properties of each default value are fulltext indexed if they are of type STRING and included in the node scope index. A node scope search finds normally all nodes of an index. That is, the select jcr:contains(., 'foo') returns all nodes that have a string property containing the word 'foo'. You can exclude explicitly a property from the node scope index:
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<index-rule nodeType="nt:unstructured">
<property nodeScopeIndex="false">Text</property>
</index-rule>
</configuration>
Sometimes it is useful to include the contents of descendant nodes into a single node to easier search on content that is scattered across multiple nodes.
JCR allows you to define indexed aggregates, basing on relative path patterns and primary node types.
The following example creates an indexed aggregate on nt:file that includes the content of the jcr:content node:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<aggregate primaryType="nt:file">
<include>jcr:content</include>
</aggregate>
</configuration>
You can also restrict the included nodes to a certain type:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<aggregate primaryType="nt:file">
<include primaryType="nt:resource">jcr:content</include>
</aggregate>
</configuration>
You may also use the asterisk (*) to match all child nodes:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<aggregate primaryType="nt:file">http://wiki.exoplatform.com/xwiki/bin/edit/JCR/Search+Configuration
<include primaryType="nt:resource">*</include>
</aggregate>
</configuration>
If you wish to include nodes up to a certain depth below the current node, you can add multiple include elements. For example, the nt:file node may contain a complete XML document under jcr:content:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<aggregate primaryType="nt:file">
<include>*</include>
<include>*/*</include>
<include>*/*/*</include>
</aggregate>
</configuration>
In this configuration section, you will define how a property has to be analyzed. If there is an analyzer configuration for a property, this analyzer is used for indexing and searching of this property. For example:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<analyzers>
<analyzer class="org.apache.lucene.analysis.KeywordAnalyzer">
<property>mytext</property>
</analyzer>
<analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer">
<property>mytext2</property>
</analyzer>
</analyzers>
</configuration>
The configuration above means that the property "mytext" for the entire workspace is indexed (and searched) with the Lucene KeywordAnalyzer, and property "mytext2" with the WhitespaceAnalyzer. Using different analyzers for different languages is particularly useful.
The WhitespaceAnalyzer tokenizes a property, the KeywordAnalyzer takes the property as a whole.
Characteristics of node scope searches
When using analyzers, you may encounter an unexpected behavior when searching within a property compared to searching within a node scope. The reason is that the node scope always uses the global analyzer.
Let's suppose that the "mytext" property contains the "testing my analyzers" text and that you have not configured any analyzers for the "mytext" property (and not changed the default analyzer in SearchIndex).
For example, if your query is as follows:
xpath = "//*[jcr:contains(mytext,'analyzer')]"
This xpath does not return a hit in the node with the property above and default analyzers.
Also a search on the node scope
xpath = "//*[jcr:contains(.,'analyzer')]"
will not give a hit. Realize that you can only set specific analyzers on a node property, and that the node scope indexing/analyzing is always done with the globally defined analyzer in the SearchIndex element.
Now, if you change the analyzer used to index the "mytext" property above to
<analyzer class="org.apache.lucene.analysis.Analyzer.GermanAnalyzer">
<property>mytext</property>
</analyzer>
and you do the same search again, then for
xpath = "//*[jcr:contains(mytext,'analyzer')]"
you would get a hit because of the word stemming (analyzers - analyzer).
The other search,
xpath = "//*[jcr:contains(.,'analyzer')]"
still would not give a result, since the node scope is indexed with the global analyzer, which in this case does not take into account any word stemming.
In conclusion, be aware that when using analyzers for specific properties, you might find a hit in a property for some search text, and you do not find a hit with the same search text in the node scope of the property.
Both index rules and index aggregates influence how content is indexed in JCR. If you change the configuration, the existing content is not automatically re-indexed according to the new rules. You, therefore, have to manually re-index the content when you change the configuration.
JCR supports some advanced features, which are not specified in JSR-170:
Get a text excerpt with highlighted words that matches the query: ExcerptProvider.
Search a term and its synonyms: SynonymSearch.
Search similar nodes: SimilaritySearch.
Check spelling of a full text query statement: SpellChecker.
Define index aggregates and rules: IndexingConfiguration.
In general, LockManager stores Lock objects, so it can give a Lock object or can release it.
Also, LockManager is responsible for removing Locks that live too long. This parameter may be configured with "time-out" property.
JCR provides two basic implementations of LockManager:
org.exoplatform.services.jcr.impl.core.lock.LockManagerImpl
org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl
In this article, we will mostly mention about CacheableLockManagerImpl.
You can enable LockManager by adding lock-manager-configuration to workspace-configuration.
For example:
<workspace name="ws">
...
<lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl">
<properties>
<property name="time-out" value="15m" />
...
</properties>
</lock-manager>
...
</workspace>
CacheableLockManagerImpl stores Lock objects in JBoss-cache, so Locks are replicable and affect on cluster, not only a single node. Also, JBoss-cache has JDBCCacheLoader, so Locks will be stored to the database.
Both implementations support to remove Expired Locks. LockRemover separates threads that periodically ask LockManager to remove Locks that live so long. So, the timeout for LockRemover may be set as follows (the default value is 30m).
<properties>
<property name="time-out" value="10m" />
...
</properties>
Replication requirements are the same for Cache.
You can see a full JCR configuration example here.
clusterName ("jbosscache-cluster-name")
must be unique.
cache.jdbc.table.name must be unique
for each datasource.
cache.jdbc.fqn.type and
cache.jdbc.node.type must be configured basing on your
database.
There are a few ways to configureCacheableLockManagerImpl, and all of them configure
JBoss-cache
and JDBCCacheLoade.
See http://community.jboss.org/wiki/JBossCacheJDBCCacheLoader for more information.
Simple JbossCache configuration:
The first way is putting JbossCache configuration file path to CacheableLockManagerImpl.
This configuration is not so good as you think. As the repository may contain many workspaces, and each workspace must contain LockManager configuration, and LockManager configuration may contain the JbossCache config file. So, the total configuration will grow up. However, it is useful if you want to have a single LockManager with a special configuration.
The configuration is as follows:
<lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl">
<properties>
<property name="time-out" value="15m" />
<property name="jbosscache-configuration" value="${gatein.conf.dir}/jcr/jbosscache/${gatein.jcr.config.type}/lock-config.xmll" />
</properties>
</lock-manager>
test-jbosscache-lock-config.xml
<?xml version="1.0" encoding="UTF-8"?> <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.2">
<locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false" lockAcquisitionTimeout="20000" />
<clustering mode="replication" clusterName="JBoss-Cache-Lock-Cluster_Name">
<stateRetrieval timeout="20000" fetchInMemoryState="false" />
<sync />
</clustering>
<loaders passivation="false" shared="true">
<preload>
<node fqn="/" />
</preload>
<loader class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader" async="false" fetchPersistentState="false" ignoreModifications="false" purgeOnStartup="false">
<properties>
cache.jdbc.table.name=jcrlocks_ws
cache.jdbc.table.create=true
cache.jdbc.table.drop=false
cache.jdbc.table.primarykey=jcrlocks_ws_pk
cache.jdbc.fqn.column=fqn
cache.jdbc.fqn.type=VARCHAR(512)
cache.jdbc.node.column=node
cache.jdbc.node.type=<BLOB>
cache.jdbc.parent.column=parent
cache.jdbc.datasource=jdbcjcr
</properties>
</loader>
</loaders>
</jbosscache>
Configuration requirements:
<clustering mode="replication" clusterName="JBoss-Cache-Lock-Cluster_Name">: The cluster name
must be unique.
cache.jdbc.table.name: must be unique
for each datasource.
cache.jdbc.node.type and
cache.jdbc.fqn.type: must be configured
basing on your database.
To prevent any consistency issue regarding the lock data, ensure that your cache loader is
org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader
and your database engine is transactional.
Template JBossCache configuration
The second way is using the template JBoss-cache configuration for all LockManagers.
The lock template configuration:
test-jbosscache-lock.xml
<?xml version="1.0" encoding="UTF-8"?>
<jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1">
<locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false" lockAcquisitionTimeout="20000" />
<clustering mode="replication" clusterName="${jbosscache-cluster-name}">
<stateRetrieval timeout="20000" fetchInMemoryState="false" />
<sync />
</clustering>
<loaders passivation="false" shared="true">
<!-- All the data of the JCR locks needs to be loaded at startup -->
<preload>
<node fqn="/" />
</preload>
<!--
For another cache-loader class you should use another template with
cache-loader specific parameters
->
<loader class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader" async="false" fetchPersistentState="false"
ignoreModifications="false" purgeOnStartup="false">
<properties>
cache.jdbc.table.name=${jbosscache-cl-cache.jdbc.table.name}
cache.jdbc.table.create=${jbosscache-cl-cache.jdbc.table.create}
cache.jdbc.table.drop=${jbosscache-cl-cache.jdbc.table.drop}
cache.jdbc.table.primarykey=${jbosscache-cl-cache.jdbc.table.primarykey}
cache.jdbc.fqn.column=${jbosscache-cl-cache.jdbc.fqn.column}
cache.jdbc.fqn.type=${jbosscache-cl-cache.jdbc.fqn.type}
cache.jdbc.node.column=${jbosscache-cl-cache.jdbc.node.column}
cache.jdbc.node.type=${jbosscache-cl-cache.jdbc.node.type}
cache.jdbc.parent.column=${jbosscache-cl-cache.jdbc.parent.column}
cache.jdbc.datasource=${jbosscache-cl-cache.jdbc.datasource}
</properties>
</loader>
</loaders>
</jbosscache>
As you see, all configurable parameters are filled by templates and will be replaced by LockManagers configuration parameters:
<lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl">
<properties>
<property name="time-out" value="15m" />
<property name="jbosscache-configuration" value="test-jbosscache-lock.xml" />
<property name="jgroups-configuration" value="udp-mux.xml" />
<property name="jgroups-multiplexer-stack" value="false" />
<property name="jbosscache-cluster-name" value="JCR-cluster-locks" />
<property name="jbosscache-cl-cache.jdbc.table.name" value="jcrlocks" />
<property name="jbosscache-cl-cache.jdbc.table.create" value="true" />
<property name="jbosscache-cl-cache.jdbc.table.drop" value="false" />
<property name="jbosscache-cl-cache.jdbc.table.primarykey" value="jcrlocks_pk" />
<property name="jbosscache-cl-cache.jdbc.fqn.column" value="fqn" />
<property name="jbosscache-cl-cache.jdbc.fqn.type" value="AUTO"/>
<property name="jbosscache-cl-cache.jdbc.node.column" value="node" />
<property name="jbosscache-cl-cache.jdbc.node.type" value="AUTO"/>
<property name="jbosscache-cl-cache.jdbc.parent.column" value="parent" />
<property name="jbosscache-cl-cache.jdbc.datasource" value="jdbcjcr" />
<property name="jbosscache-shareable" value="true" />
</properties>
</lock-manager>
Configuration requirements:
jbosscache-cl-cache.jdbc.fqn.column
and jbosscache-cl-cache.jdbc.node.type is
the same as cache.jdbc.fqn.type and cache.jdbc.node.type in
JBoss-Cache configuration. You can set those data types according
to your database type or set it as AUTO (or do not set at all) and
data type will be detected automatically.
As you see, jgroups-configuration is moved to separate the configuration file - udp-mux.xml. In this case, the udp-mux.xml file is a common JGroup configuration for all components (QueryHandler, Cache, LockManager), but you can still create your own configuration.
our udp-mux.xml
<config>
<UDP
singleton_name="JCR-cluster"
mcast_addr="${jgroups.udp.mcast_addr:228.10.10.10}"
mcast_port="${jgroups.udp.mcast_port:45588}"
tos="8"
ucast_recv_buf_size="20000000"
ucast_send_buf_size="640000"
mcast_recv_buf_size="25000000"
mcast_send_buf_size="640000"
loopback="false"
discard_incompatible_packets="true"
max_bundle_size="64000"
max_bundle_timeout="30"
use_incoming_packet_handler="true"
ip_ttl="${jgroups.udp.ip_ttl:2}"
enable_bundling="false"
enable_diagnostics="true"
thread_naming_pattern="cl"
use_concurrent_stack="true"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="1000"
thread_pool.rejection_policy="discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="1"
oob_thread_pool.max_threads="8"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="Run" />
<PING timeout="2000"
num_initial_members="3"/>
<MERGE2 max_interval="30000"
min_interval="10000"/>
<FD_SOCK />
<FD timeout="10000" max_tries="5" shun="true" />
<VERIFY_SUSPECT timeout="1500" />
<BARRIER />
<pbcast.NAKACK use_stats_for_retransmission="false"
exponential_backoff="150"
use_mcast_xmit="true" gc_lag="0"
retransmit_timeout="50,300,600,1200"
discard_delivered_msgs="true"/>
<UNICAST timeout="300,600,1200" />
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="1000000"/>
<VIEW_SYNC avg_send_interval="60000" />
<pbcast.GMS print_local_addr="true" join_timeout="3000"
shun="false"
view_bundling="true"/>
<FC max_credits="500000"
min_threshold="0.20"/>
<FRAG2 frag_size="60000" />
<!--pbcast.STREAMING_STATE_TRANSFER /-->
<pbcast.STATE_TRANSFER />
<pbcast.FLUSH />
</config>
Before going deeper into the QueryHandler configuration, you need to learn about the concept of Indexing in clustered environment.
Indexing in clustered environment
JCR offers multiple indexing strategies. They include both strategies for standalone and clustered environments using the advantages of running in a single JVM or doing the best to use all resources available in cluster. JCR uses Lucene library as underlying search and indexing engine, but it has several limitations that greatly reduce possibilities and limits the usage of cluster advantages. That is why JCR offers three strategies that are suitable for its own usecases. They are standalone, clustered with shared index and clustered with local indexes. Each one has its pros and cons.
Standalone strategy provides a stack of indexes to achieve greater performance within single JVM.

It combines in-memory buffer index directory with delayed file-system flushing. This index is called "Volatile" and it is invoked in searches also. Within some conditions volatile index is flushed to the persistent storage (file system) as new index directory. This allows to achieve great results for write operations.
Clustered implementation with local indexes is built upon same strategy with volatile in-memory index buffer along with delayed flushing on persistent storage.

As this implementation designed for clustered environment, it has additional mechanisms for data delivery within cluster. Actual text extraction jobs are done on the same node that does content operations (for example: write operation). Prepared "documents" (Lucene term that means block of data ready for indexing) are replicated within cluster nodes and processed by local indexes. So each cluster instance has the same index content. When new node joins the cluster, it has no initial index, so it must be created. There are some supported ways of doing this operation. The simplest is to simply copy the index manually but this is not intended for use. If no initial index is found, JCR will use the automated scenarios. They are controlled via configuration (see the index-recovery-mode parameter) offering full re-indexing from database or copying from another cluster node.
For some reasons having a multiple index copies on each instance can be costly. So shared index can be used instead (see diagram below).

This indexing strategy combines advantages of in-memory index along with shared persistent index offering "near" real time search capabilities. This means that newly added content is accessible via search immediately. This strategy allows nodes to index data in their own volatile (in-memory) indexes, but persistent indexes are managed by single "coordinator" node only. Each cluster instance has a read access for shared index to perform queries combining search results found in own in-memory index also. Take into account that shared folder must be configured in your system environment (for example: mounted NFS folder). However, this strategy in some extremely rare cases may have a bit different volatile indexes within cluster instances for a while. In a few seconds they will be up to date.
See more about Search Configuration.
See the following sample configuration:
<workspace name="ws">
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
<properties>
<property name="index-dir" value="shareddir/index/db1/ws" />
<property name="changesfilter-class"
value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" />
<property name="jbosscache-configuration" value="jbosscache-indexer.xml" />
<property name="jgroups-configuration" value="udp-mux.xml" />
<property name="jgroups-multiplexer-stack" value="true" />
<property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" />
<property name="max-volatile-time" value="60" />
<property name="rdbms-reindexing" value="true" />
<property name="reindexing-page-size" value="1000" />
<property name="index-recovery-mode" value="from-coordinator" />
<property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter" />
<property name="indexing-thread-pool-size" value="16" />
</properties>
</query-handler>
</workspace>
| Property name | Description |
|---|---|
index-dir | Path to index. |
changesfilter-class | The FQN of the class is to indicate the policy of managing the Lucene indexes changes. This class must extend
org.exoplatform.services.jcr.impl.core.query.IndexerChangesFilter.
This must be set in cluster environment to define the clustering
strategy which needs to be adopted. To use the Shared Indexes Strategy, you can
set it to org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter.
It is recommended you set the Local Indexes Strategy to org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter. |
jbosscache-configuration | Template of JBoss-cache configuration for all query-handlers in repository. |
jgroups-configuration | This is the path to JGroups configuration that should not be anymore jgroups' stack definitions but a normal jgroups configuration format with the shared transport configured by setting the jgroups property singleton_name to a unique name (it must remain to be unique from one portal container to another). This file is also pre-bundled with templates and is recommended for use. |
jgroups-multiplexer-stack | If the parameter value is set to "true", it will indicate that the file corresponding to the
jgroups-configuration
parameter is actually a file defining a set of jgroups multiplexer stacks.
In the XML tag jgroupsConfig within the jboss cache configuration, you will then be able to set the name
of the multiplexer stack to use thanks to multiplexerStack the attribute.
Please note that the jgroups multiplexer has been deprecated by the jgroups Team and has been replaced
with the shared transport so it is highly recommended you not use it anymore.
|
jbosscache-cluster-name | Cluster name which must be unique. |
max-volatile-time | Max time to live for Volatile Index. |
rdbms-reindexing | Indicate that it is needed to use RDBMS re-indexing mechanism if possible. The default value is "true". |
reindexing-page-size | The maximum amount of nodes which can be retrieved from storage for re-indexing purpose. The default value is "100". |
index-recovery-mode | If the parameter has been set to from-indexing, a full indexing will be automatically launched. If the parameter has been set to from-coordinator (default behavior), the index will be retrieved from coordinator. |
index-recovery-filter | Define implementation class or classes of RecoveryFilters, the mechanism of index synchronization for Local Index strategy. |
async-reindexing | Control the process of re-indexing on JCR's startup. If a flag is set, indexing will be launched asynchronously without blocking the JCR. Its default value is "false". |
indexing-thread-pool-size | Define the total amount of indexing threads. |
If you use postgreSQL and the rdbms-reindexing parameter is set
to "true", the performance of the queries used while indexing can be improved by setting the
enable_seqscan
to off
or
default_statistics_target
to at least
50
in the configuration of your database. Then, you need to restart DB server and make analyze of
the JCR_SVALUE (or JCR_MVALUE) table.
If you use DB2 and the rdbms-reindexing parameter is set to
"true", the performance of the queries used while indexing can be
improved by making statistics on tables by running "RUNSTATS ON TABLE
<scheme>.<table> WITH DISTRIBUTION AND INDEXES ALL" for
JCR_SITEM (or JCR_MITEM) and JCR_SVALUE (or JCR_MVALUE) tables.
For both cluster-ready implementations JBoss Cache, JGroups and Changes Filter values must be defined. Shared index requires some types of remote or shared file systems to be attached in a system (for example, NFS and SMB).
Indexing directory ("indexDir" value) must point to it. Setting "changesfilter-class" to "org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" will enable shared index implementation.
<workspace name="ws">
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
<properties>
<property name="index-dir" value="/mnt/nfs_drive/index/db1/ws" />
<property name="changesfilter-class"
value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" />
<property name="jbosscache-configuration" value="jbosscache-indexer.xml" />
<property name="jgroups-configuration" value="udp-mux.xml" />
<property name="jgroups-multiplexer-stack" value="false" />
<property name="jbosscache-cluster-name" value="JCR-cluster-indexer" />
<property name="max-volatile-time" value="60" />
<property name="rdbms-reindexing" value="true" />
<property name="reindexing-page-size" value="1000" />
<property name="index-recovery-mode" value="from-coordinator" />
<property name="jbosscache-shareable" value="true" />
</properties>
</query-handler>
</workspace>
To use cluster-ready strategy based on local indexes, the following configuration must be applied when each node has its own copy of index on local file system. Indexing directory must point to any folder on local file system and "changesfilter-class" must be set to "org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter".
<workspace name="ws">
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
<properties>
<property name="index-dir" value="/mnt/nfs_drive/index/db1/ws" />
<property name="changesfilter-class"
value="org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter" />
<property name="jbosscache-configuration" value="jbosscache-indexer.xml" />
<property name="jgroups-configuration" value="udp-mux.xml" />
<property name="jgroups-multiplexer-stack" value="false" />
<property name="jbosscache-cluster-name" value="JCR-cluster-indexer" />
<property name="max-volatile-time" value="60" />
<property name="rdbms-reindexing" value="true" />
<property name="reindexing-page-size" value="1000" />
<property name="index-recovery-mode" value="from-coordinator" />
<property name="jbosscache-shareable" value="true" />
</properties>
</query-handler>
</workspace>
Common usecase for all cluster-ready applications is a hot joining and leaving of processing units. All nodes that are joining cluster for the first time or after some downtime must be in a synchronized state.
When having a deal with shared value storages, databases and indexes, cluster nodes are synchronized anytime. However it is an issue when local index strategy is used. If the new node joins cluster having no index, it will be retrieved or recreated. Node can be restarted also and thus index is not empty. Usually existing index is thought to be actual, but can be outdated.
JCR offers a mechanism called RecoveryFilters that will automatically retrieve index for the joining node on startup. This feature is a set of filters that can be defined via QueryHandler configuration:
<property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter" />
Filter number is not limited so they can be combined:
<property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter" />
<property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.SystemPropertyRecoveryFilter" />
If any one fires, the index is re-synchronized. Please take in account that DocNumberRecoveryFilter is used in cases no filter is configured. So, if resynchronization should be blocked or strictly required on start, then ConfigurationPropertyRecoveryFilter can be used.
This feature uses the standard index recovery mode defined by previously described parameter (can be "from-indexing" or "from-coordinator" (default value)).
<property name="index-recovery-mode" value="from-coordinator" />
There are couple implementations of filters:
org.exoplatform.services.jcr.impl.core.query.lucene.DummyRecoveryFilter: Always return true, for cases when index must be force
resynchronized (recovered) each time;
org.exoplatform.services.jcr.impl.core.query.lucene.SystemPropertyRecoveryFilter: Return value of system property
"org.exoplatform.jcr.recoveryfilter.forcereindexing". So index
recovery can be controlled from the top without changing
documentation using system properties;
org.exoplatform.services.jcr.impl.core.query.lucene.ConfigurationPropertyRecoveryFilter: Return value of QueryHandler configuration property
"index-recovery-filter-forcereindexing" so the index recovery can be
controlled from configuration separately for each workspace.
For example:
<property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.ConfigurationPropertyRecoveryFilter" />
<property name="index-recovery-filter-forcereindexing" value="true" />
org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter: Check number of documents in index on coordinator side and
self-side and return true if differs. Advantage of this filter
comparing to other is it will skip reindexing for workspaces where
index was not modified. For example, there are 10 repositories with 3
workspaces in each one. Only one is really heavily used in cluster: frontend/production. So using this filter will only re-index
those workspaces that are really changed without affecting other
indexes thus greatly reduce the startup time.
Managing a big set of data using JCR in production environment sometimes requires special operations with Indexes stored on File System. One of those maintenance operations is a recreation of it or "re-indexing". There are various usecases when re-indexing is important to do. They include hardware faults, hard restarts, data-corruption, migrations and JCR updates that brings new features related to index. Usually, index re-creation requested on server's startup or in runtime.
First of all, you can not launch Hot re-indexing via JMX if index is already in offline mode. It means that index is currently invoked in some operations, like re-indexing at startup, copying in cluster to another node or whatever. Another important notice is Hot Asynchronous Reindexing via JMX and "on startup" re-indexing are completely different features. So you can not get the state of startup re-indexing using the command getHotReindexingState in JMX interface, but there are some common JMX operations:
getIOMode: return the current index IO mode (READ_ONLY / READ_WRITE), belongs to clustered configuration states.
getState: return the current state (ONLINE / OFFLINE).
Common usecase for updating and re-creating the index is to stop the server and manually remove indexes for workspaces requiring it. When the server is started, missing indexes are automatically recovered by re-indexing.
JCR Supports direct RDBMS re-indexing, that is usually
faster than ordinary and can be configured via the rdbms-reindexing QueryHandler parameter
set to "true" (Refer to
the Query-handler configuration overview for more information).
Another new feature is the asynchronous indexing on startup.
Usually the startup is blocked until
the process is finished. Block can take any period of time, depending on
amount of data persisted in repositories. However, this can be resolved by
using an asynchronous approach of startup indexation. In brief,
it performs all operations with index in background, without blocking
the repository. This is controlled by the value of "async-reindexing"
parameter in QueryHandler configuration. With asynchronous indexation
active, JCR starts with no active indexes present. Queries on JCR still
can be executed without exceptions
but no results will be returned until the index creation has been completed. Checking index state is possible via
QueryManagerImpl:
boolean online = ((QueryManagerImpl)Workspace.getQueryManager()).getQueryHandeler().isOnline();
"OFFLINE" state means that index is currently re-creating. When the state has been changed, the corresponding log event is printed. From the start of background task, index is switched to "OFFLINE" with the following log event:
[INFO] Setting index OFFLINE (repository/production[system]).
When the process has been finished, two events are logged:
[INFO] Created initial index for 143018 nodes (repository/production[system]). [INFO] Setting index ONLINE (repository/production[system]).
Those two log lines indicate the end of process for workspace
given in brackets. Calling isOnline() as mentioned above will also
return true.
Hot asynchronous workspace reindexing via JMX
Some hard system faults, error during upgrades, migration issues and some other factors may corrupt the index. Most likely end customers would like the production systems to fix index issues in run-time without delays and restarts. The current version of JCR supports "Hot Asynchronous Workspace Reindexing" feature. It allows end-user (Service Administrator) to launch the process in background without stopping or blocking the whole application by using any JMX-compatible console (see the "JConsole in action" screenshot below).

The server can continue working as expected while index is
re-created. This depends on the flag "allow queries", passed via JMX
interface to re-index operation invocation. If the flag is set, the
application continues working. However, there is one critical limitation that the
end-users must be aware. If the index is frozen while background task is
running, it means queries are performed on index present on the
moment of task startup and data written into repository after startup
will not be available through the search until the process finished. Data added
during re-indexation is also indexed, but will be available only when
task is done. Briefly, JCR makes the snapshot of indexes on asynch task
startup and uses it for searches. When the operation is finished, the stale indexes
are replaced with the new creation, including newly added data. If the "allow
queries" flag is set to "false", all queries will throw an exception while
the task is running. The current state can be acquired using the following JMX
operation:
getHotReindexingState(): return information about latest invocation: start time, if in progress or finish time if done.
As mentioned above, JCR Indexing is based on Lucene indexing library as underlying search engine. It uses Directories to store index and manages access to index by Lock Factories.
By default, JCR implementation uses optimal combination of Directory implementation and Lock Factory implementation. When running on OS different from Windows, NIOFSDirectory implementation is used and SimpleFSDirectory is used for Windows stations.
NativeFSLockFactory is an optimal solution for wide variety of cases including clustered environment with NFS shared resources. However, those defaults can be overridden with the help of system properties. There are two properties that are responsible for changing default behavior:
Refer to Lucene documentation for more information, but make sure that you know what you are changing. JCR allows end users to change implementation classes of Lucene internals, but does not guarantee its stability and functionality.
Every node of cluster MUST have the same mounted Network File System with the read and write permissions on it.
"/mnt/tornado" - path to the mounted Network File System (all cluster nodes must use the same NFS).
Every node of cluster MUST use the same database.
The same clusters on different nodes MUST have the same names (for example, if Indexer cluster in workspace production on the first node has the name "production_indexer_cluster", then indexer clusters in workspace production on all other nodes MUST have the same name "production_indexer_cluster" ).
JBossTS Transaction Service and JBossCache Transaction Manager are used. This can be checked via
exo-configuration.xml as bellow:
<component>
<key>org.jboss.cache.transaction.TransactionManagerLookup</key>
<type>org.jboss.cache.GenericTransactionManagerLookup</type>
</component>
<component>
<key>org.exoplatform.services.transaction.TransactionService</key>
<type>org.exoplatform.services.transaction.jbosscache.JBossTransactionsService</type>
<init-params>
<value-param>
<name>timeout</name>
<value>300</value>
</value-param>
</init-params>
</component>
Configuration of every workspace in repository must contain the following parts:
<value-storages>
<value-storage id="system" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage">
<properties>
<property name="path" value="/mnt/tornado/temp/values/production" />
</properties>
<filters>
<filter property-type="Binary" />
</filters>
</value-storage>
</value-storages>
![]() |
|
<cache enabled="true" class="org.exoplatform.services.jcr.impl.dataflow.persistent.jbosscache.JBossCacheWorkspaceStorageCache">
<properties>
<property name="jbosscache-configuration" value="jar:/conf/portal/test-jbosscache-data.xml" />
<property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" />
<property name="jbosscache-cluster-name" value="JCR_Cluster_cache" />
<property name="jgroups-multiplexer-stack" value="false" />
<property name="jbosscache-shareable" value="true" />
</properties>
</cache>
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
You must replace or add to the <query-handler> block,
the changesfilter-class parameter equals with:
<property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter"/>
Then, add the JBossCache-oriented configuration. The configuration should look like:
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
<properties>
<property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" />
<property name="index-dir" value="/mnt/tornado/temp/jcrlucenedb/production" />
<property name="jbosscache-configuration" value="jar:/conf/portal/test-jbosscache-indexer.xml" />
<property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" />
<property name="jbosscache-cluster-name" value="JCR_Cluster_indexer" />
<property name="jgroups-multiplexer-stack" value="false" />
<property name="jbosscache-shareable" value="true" />
<property name="max-volatile-time" value="60" />
</properties>
</query-handler>
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
These properties have the same meaning and restrictions as in
the previous code block except the last one max-volatile-time.
This may be the hardest element to configure, because you have to define access to the database where locks will be stored. Replace the existing lock-manager with configuration shown below:
<lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl">
<properties>
<property name="time-out" value="15m" />
<property name="jbosscache-configuration" value="jar:/conf/portal/test-jbosscache-lock.xml" />
<property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" />
<property name="jbosscache-cluster-name" value="JCR_Cluster_locks" />
<property name="jbosscache-cl-cache.jdbc.table.name" value="jcrlocks"/>
<property name="jbosscache-cl-cache.jdbc.table.create" value="true"/>
<property name="jbosscache-cl-cache.jdbc.table.drop" value="false"/>
<property name="jbosscache-cl-cache.jdbc.table.primarykey" value="jcrlocks_pk"/>
<property name="jbosscache-cl-cache.jdbc.fqn.column" value="fqn"/>
<property name="jbosscache-cl-cache.jdbc.node.column" value="node"/>
<property name="jbosscache-cl-cache.jdbc.parent.column" value="parent"/>
<property name="jbosscache-cl-cache.jdbc.datasource" value="jdbcjcr"/>
<property name="jgroups-multiplexer-stack" value="false" />
<property name="jbosscache-shareable" value="true" />
</properties>
</lock-manager>
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
Few properties are the same as in the previous components, but here you can see some strange "jbosscache-cl-cache.jdbc.*" properties. They define access parameters for the database where the lock is persisted.
This section will show you how to use and configure Jboss Cache in the clustered environment. Also, you will know how to use a template-based configuration offered by JCR for JBoss Cache instances.
For indexer, lock manager and data container
Each mentioned component uses instances of JBoss Cache product for caching in clustered environment. So every element has its own transport and has to be configured in a proper way. As usual, workspaces have similar configuration but with different cluster-names and maybe some other parameters. The simplest way to configure them is to define their own configuration files for each component in each workspace:
<property name="jbosscache-configuration" value="${gatein.jcr.index.cache.config}"/>
However, if there are few workspaces, configuring them in such a way can
be painful and hard to manage JCR which offers a template-based configuration for JBoss Cache instances. You can have one template for
Lock Manager, one for Indexer and one for data container and use them in
all the workspaces, defining the map of substitution parameters in a main
configuration file. Just simply define ${jbosscache-<parameter
name>} inside xml-template and list correct values in the JCR configuration
file just below "jbosscache-configuration", as shown:
Template:
...
<clustering mode="replication" clusterName="${jbosscache-cluster-name}">
<stateRetrieval timeout="20000" fetchInMemoryState="false" />
...
JCR configuration file:
...
<property name="jbosscache-configuration" value="${gatein.jcr.lock.cache.config}" />
<property name="jbosscache-cluster-name" value="${gatein.jcr.jgroups.config}" />
...
JGroups is used by JBoss Cache for network communications and transport in a clustered environment. If property "jgroups-configuration" is defined in component configuration, it will be injected into the JBoss Cache instance on startup.
<property name="jgroups-configuration" value="${gatein.jcr.jgroups.config}" />
As mentioned above, each component (lock manager, data container and query handler) for each workspace requires its own clustered environment. In other words, they have their own clusters with unique names. By default, each cluster should perform multi-casts on a separate port. This configuration leads to much unnecessary overhead on cluster. That is why JGroups offers multiplexer feature, providing the ability to use one single channel for a set of clusters. This feature reduces network overheads and increase performance and stability of application.
To enable multiplexer
stack, you should define appropriate configuration file (upd-mux.xml is
pre-shipped one with JCR) and set "jgroups-multiplexer-stack" to
"true".
<property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" />
<property name="jgroups-multiplexer-stack" value="true" />
It is now highly recommended to use the shared transport instead of the multiplexer. To do so, simply disable the multiplexer stack in the configuration of each component by setting the property jgroups-multiplexer-stack to "false" then you will need to ensure that the format of your jgroups configuration is not anymore a jgroups stack definitions but a normal configuration. Finally, you will need to set the property singleton_name of your JGroups configuration to a unique name (this name must not be the same from one portal container to another).
<property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" />
<property name="jgroups-multiplexer-stack" value="false" />
Allow sharing JBoss Cache instances
A JBoss Cache instance is quite resource consuming and there are three JBoss Cache instances by default (one instance for the indexer, one for the lock manager and one for the data container) for each workspace, so if you intend to have a lot of workspaces, it could make sense to decide to share one JBoss Cache instance with several cache instances of the same type (for example: indexer, lock manager or data container). This feature is disabled by default and can be enabled at component configuration level (for example: indexer configuration, lock manager configuration and/or data container configuration) by setting the property "jbosscache-shareable" to "true" as below:
<property name="jbosscache-shareable" value="true" />
Once enabled, this feature will allow the JBoss Cache instance used by the component to be re-used by another components of the same type (for example: indexer, lock manager or data container) with the same JBoss Cache configuration (except the eviction configuration that can be different). This means all the parameters of type ${jbosscache-<parameter name>} must be identical between the components of same type of different workspaces. In other words, if you use the same values for the parameters of type ${jbosscache-<parameter name>} in each workspace, you will have only 3 JBoss Cache instances (one instance for the indexer, one for the lock manager and one for the data container) used whatever the total amount of workspaces defined.
Shipped JBoss Cache configuration templates
JCR implementation is shipped with ready-to-use JBoss Cache configuration templates for JCR's components. They are situated in application package in /conf/porta/ folder.
Data container template: The Data container template is in jbosscache-data.xml.
<?xml version="1.0" encoding="UTF-8"?>
<jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1">
<locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false"
lockAcquisitionTimeout="20000" />
<clustering mode="replication" clusterName="${jbosscache-cluster-name}">
<stateRetrieval timeout="20000" fetchInMemoryState="false" />
<sync />
</clustering>
<!-- Eviction configuration -->
<eviction wakeUpInterval="5000">
<default algorithmClass="org.jboss.cache.eviction.ExpirationAlgorithm"
actionPolicyClass="org.exoplatform.services.jcr.impl.dataflow.persistent.jbosscache.ParentNodeEvictionActionPolicy"
eventQueueSize="1000000">
<property name="maxNodes" value="1000000" />
<property name="warnNoExpirationKey" value="false" />
</default>
</eviction>
</jbosscache>
Lock manager template: The Lock manager template name is in jbosscache-lock.xml.
<?xml version="1.0" encoding="UTF-8"?>
<jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1">
<locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false"
lockAcquisitionTimeout="20000" />
<clustering mode="replication" clusterName="${jbosscache-cluster-name}">
<stateRetrieval timeout="20000" fetchInMemoryState="false" />
<sync />
</clustering>
<loaders passivation="false" shared="true">
<preload>
<node fqn="/" />
</preload>
<loader class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader" async="false" fetchPersistentState="false"
ignoreModifications="false" purgeOnStartup="false">
<properties>
cache.jdbc.table.name=${jbosscache-cl-cache.jdbc.table.name}
cache.jdbc.table.create=${jbosscache-cl-cache.jdbc.table.create}
cache.jdbc.table.drop=${jbosscache-cl-cache.jdbc.table.drop}
cache.jdbc.table.primarykey=${jbosscache-cl-cache.jdbc.table.primarykey}
cache.jdbc.fqn.column=${jbosscache-cl-cache.jdbc.fqn.column}
cache.jdbc.fqn.type=${jbosscache-cl-cache.jdbc.fqn.type}
cache.jdbc.node.column=${jbosscache-cl-cache.jdbc.node.column}
cache.jdbc.node.type=${jbosscache-cl-cache.jdbc.node.type}
cache.jdbc.parent.column=${jbosscache-cl-cache.jdbc.parent.column}
cache.jdbc.datasource=${jbosscache-cl-cache.jdbc.datasource}
</properties>
</loader>
</loaders>
</jbosscache>
To prevent any consistency issue regarding the lock data, ensure that your cache loader is org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader and your database engine is transactional.
Query handler (indexer) template
Have a look at jbosscache-indexer.xml
<?xml version="1.0" encoding="UTF-8"?>
<jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1">
<locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false"
lockAcquisitionTimeout="20000" />
<clustering mode="replication" clusterName="${jbosscache-cluster-name}">
<stateRetrieval timeout="20000" fetchInMemoryState="false" />
<sync />
</clustering>
</jbosscache>
To learn about the properties, see their corresponding descriptions in Configure JCR in cluster.
To be sure that all transactions are over and JCR is in consistent state after stopping node, you need to follow these steps:
Connect using JMX to one of cluster's node which you will not need to stop.
Use RepositorySuspendController suspend all repositories.
Stop the node.
Use RepositorySuspendController to resume all repositories.
RepositoryCreationService is the service which is used to create repositories in runtime. The service can be used in a standalone or cluster environment.
RepositoryConfigurationService depends on the next components:
DBCreator which is used to create new database for each unbinded datasource.
BackupManager which is used to create repository from backup.
RPCService which is used for communicating between cluster-nodes.
RPCService may not be configured. In this case, RepositoryService will work as a standalone service.
The user executes reserveRepositoryName(String repositoryName) -
client-node calls coordinator-node to reserve repositoryName. If this
name is already reserved or repository with this name exists,
client-node will fetch RepositoryCreationException. If not, Client will
get token string.
Instead of executing createRepository(String backupId,
RepositoryEntry rEntry, String token), Coordinator-node checks the
token, and creates Repository.
When the repository has been created, the user-node broadcasts a message to
all clusterNodes with RepositoryEntry, so each cluster node starts new
Repository.
There are two ways to create a repository: make it in single step - just
call createRepository(String backupId, RepositoryEntry); or reserve
repositoryName at first (reserveRepositoryName(String repositoryName),
then create the reserved repository (createRepository(String backupId,
RepositoryEntry rEntry, String token).
Each datasource in RepositoryEntry of a new Repository must have
unbinded datasources. This means such a datasource must not have database behind it.
This restriction is to avoid corruption with existing
repositories data.
RPCService is an optional component, but
RepositoryCreatorService cannot communicate with other cluster-nodes
without it.
The RepositoryCreationService configuration is as follows:
<component>
<key>org.exoplatform.services.jcr.ext.repository.creation.RepositoryCreationService</key>
<type>
org.exoplatform.services.jcr.ext.repository.creation.RepositoryCreationServiceImpl
</type>
<init-params>
<value-param>
<name>factory-class-name</name>
<value>org.apache.commons.dbcp.BasicDataSourceFactory</value>
</value-param>
</init-params>
</component>
![]() |
|
The following code shows all methods proposed by RepositoryCreationService that is used to create a new repository:
public interface RepositoryCreationService
{
/**
* Reserves, validates and creates repository in a simplified form.
*
* @param rEntry - repository Entry - note that datasource must not exist.
* @param backupId - backup id
* @param creationProps - storage creation properties
* @throws RepositoryConfigurationException
* if some exception occurred during repository creation or repository name is absent in reserved list
* @throws RepositoryCreationServiceException
* if some exception occurred during repository creation or repository name is absent in reserved list
*/
void createRepository(String backupId, RepositoryEntry rEntry, StorageCreationProperties creationProps)
throws RepositoryConfigurationException, RepositoryCreationException;
/**
* Reserves, validates and creates repository in a simplified form.
*
* @param rEntry - repository Entry - note that datasource must not exist.
* @param backupId - backup id
* @throws RepositoryConfigurationException
* if some exception occurred during repository creation or repository name is absent in reserved list
* @throws RepositoryCreationServiceException
* if some exception occurred during repository creation or repository name is absent in reserved list
*/
void createRepository(String backupId, RepositoryEntry rEntry) throws RepositoryConfigurationException,
RepositoryCreationException;
/**
* Reserve repository name to prevent repository creation with same name from other place in same time
* via this service.
*
* @param repositoryName - repositoryName
* @return repository token. Anyone obtaining a token can later create a repository of reserved name.
* @throws RepositoryCreationServiceException if can't reserve name
*/
String reserveRepositoryName(String repositoryName) throws RepositoryCreationException;
/**
* Creates repository, using token of already reserved repository name.
* Good for cases, when repository creation should be delayed or made asynchronously in dedicated thread.
*
* @param rEntry - repository entry - note, that datasource must not exist
* @param backupId - backup id
* @param rToken - token
* @param creationProps - storage creation properties
* @throws RepositoryConfigurationException
* if some exception occurred during repository creation or repository name is absent in reserved list
* @throws RepositoryCreationServiceException
* if some exception occurred during repository creation or repository name is absent in reserved list
*/
void createRepository(String backupId, RepositoryEntry rEntry, String rToken, StorageCreationProperties creationProps)
throws RepositoryConfigurationException, RepositoryCreationException;
/**
* Creates repository, using token of already reserved repository name. Good for cases, when repository creation should be delayed or
* made asynchronously in dedicated thread.
*
* @param rEntry - repository entry - note, that datasource must not exist
* @param backupId - backup id
* @param rToken - token
* @throws RepositoryConfigurationException
* if some exception occurred during repository creation or repository name is absent in reserved list
* @throws RepositoryCreationServiceException
* if some exception occurred during repository creation or repository name is absent in reserved list
*/
void createRepository(String backupId, RepositoryEntry rEntry, String rToken)
throws RepositoryConfigurationException, RepositoryCreationException;
/**
* Remove previously created repository.
*
* @param repositoryName - the repository name to delete
* @param forceRemove - force close all opened sessions
* @throws RepositoryCreationServiceException
* if some exception occurred during repository removing occurred
*/
void removeRepository(String repositoryName, boolean forceRemove) throws RepositoryCreationException;
}
TransactionServices provides access to the TransactionManager and the UserTransaction (See JTA specification for details).
getTransactionManager() | Get the used TransactionManager. |
getUserTransaction() | Get UserTransaction on TransactionManager. |
getDefaultTimeout() | Return the default TimeOut. |
setTransactionTimeout(int seconds) | Set TimeOut in seconds. |
enlistResource(XAResource xares) | Enlist XA resource in TransactionManager. |
delistResource(XAResource xares) | Delist XA resource from TransactionManager. |
JCR proposes out of the box several implementations, they all
implement the abstract class
org.exoplatform.services.transaction.impl.AbstractTransactionService.
This main class implement the biggest part of all the methods proposed by
the TransactionService. For each sub-class of
AbstractTransactionService, you can set the
transaction timeout by configuration using the value parameter
timeout that is expressed in seconds.
To use JOTM as TransactionManager in standalone mode, simply add the following component configuration:
<component>
<key>org.exoplatform.services.transaction.TransactionService</key>
<type>org.exoplatform.services.transaction.impl.jotm.TransactionServiceJotmImpl</type>
<!-- Uncomment the lines below if you want to set default transaction timeout that is expressed in seconds -->
<!--init-params>
<value-param>
<name>timeout</name>
<value>60</value>
</value-param>
</init-params-->
</component>
If you intend to use JBoss Cache, you can use a generic TransactionService based on its TransactionManagerLookup which is able to automatically find the TransactionManager of several Application Servers thanks to a set of JNDI lookups. This generic TransactionService covers mainly the TransactionManager lookups, the UserTransaction is actually simply the TransactionManager instance that has been wrapped. See the configuration example as below:
<!-- Configuration of the TransactionManagerLookup -->
<component>
<key>org.jboss.cache.transaction.TransactionManagerLookup</key>
<type>org.jboss.cache.transaction.GenericTransactionManagerLookup</type>
</component>
<!-- Configuration of the TransactionService -->
<component>
<key>org.exoplatform.services.transaction.TransactionService</key>
<type>org.exoplatform.services.transaction.jbosscache.GenericTransactionService</type>
<!-- Uncomment the lines below if you want to set default transaction timeout that is expressed in seconds -->
<!--init-params>
<value-param>
<name>timeout</name>
<value>60</value>
</value-param>
</init-params-->
</component>
Specific GenericTransactionService for JBoss Cache and Arjuna
If you intend to use JBoss Cache with Arjuna, you can use a more
specific GenericTransactionService. It is mostly interesting in case you
want to use the real UserTransaction. See the configuration example as below:
<!-- Configuration of the TransactionManagerLookup -->
<component>
<key>org.jboss.cache.transaction.TransactionManagerLookup</key>
<type>org.jboss.cache.transaction.JBossStandaloneJTAManagerLookup</type>
</component>
<!-- Configuration of the TransactionService -->
<component>
<key>org.exoplatform.services.transaction.TransactionService</key>
<type>org.exoplatform.services.transaction.jbosscache.JBossTransactionsService</type>
<!-- Uncomment the lines below if you want to set default transaction timeout that is expressed in seconds -->
<!--init-params>
<value-param>
<name>timeout</name>
<value>60</value>
</value-param>
</init-params-->
</component>
A very specific TransactionService for JBoss AS
If you intend to use JBoss AS with JBoss Cache, you can use a very specific TransactionService for JBoss AS. See the configuration example as below:
<component>
<key>org.exoplatform.services.transaction.TransactionService</key>
<type>org.exoplatform.services.transaction.impl.jboss.JBossTransactionService</type>
<!-- Uncomment the lines below if you want to set default transaction timeout that is expressed in seconds -->
<!--init-params>
<value-param>
<name>timeout</name>
<value>60</value>
</value-param>
</init-params-->
</component>
TransactionsEssentials in standalone mode.
To use TransactionsEssentials, simply add the following component configuration:
<component>
<key>org.exoplatform.services.transaction.TransactionService</key>
<type>org.exoplatform.services.transaction.impl.atomikos.TransactionsEssentialsTransactionService</type>
<!-- Uncomment the lines below if you want to set default transaction timeout that is expressed in seconds -->
<!--init-params>
<value-param>
<name>timeout</name>
<value>60</value>
</value-param>
</init-params-->
</component>
JBossTransactionsService implements eXo TransactionService and provides access to JBoss Transaction Service (JBossTS) JTA implementation via eXo container dependency.
TransactionService is used in JCR cache org.exoplatform.services.jcr.impl.dataflow.persistent.jbosscache.JBossCacheWorkspaceStorageCache implementaion. See Cluster configuration for example.
Example configuration:
<component>
<key>org.exoplatform.services.transaction.TransactionService</key>
<type>org.exoplatform.services.transaction.jbosscache.JBossTransactionsService</type>
<init-params>
<value-param>
<name>timeout</name>
<value>3000</value>
</value-param>
</init-params>
</component>
timeout: XA transaction timeout in seconds.
By default JCR Values are stored in the Workspace Data container along with the JCR structure (for example: Nodes and Properties). JCR offers an additional option of storing JCR Values separately from Workspace Data container, which can be extremely helpful to keep Binary Large Objects (BLOBs) for instance.
Value storage configuration is a part of Repository configuration. See more details here.
Tree-based storage is recommended for most of cases. Simple 'flat' storage is good in speed of creation/deletion of values, it might be a compromise for a small storage.
JCR allows disabling the value storage by adding the property below into the configuration for the internal usage and testing purpose only:
<property name="enabled" value="false"/>
Be careful, all stored values will be unaccessible.
Holds Values in tree-like FileSystem files. The path property points to the root directory to store the files.
This is a recommended type of external storage, it can contain a large amount of files limited only by disk/volume free space.
A disadvantage is that it is a higher time on Value deletion due to unused tree-nodes remove.
<value-storage id="Storage #1" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage">
<properties>
<property name="path" value="data/values"/>
</properties>
<filters>
<filter property-type="Binary" min-value-size="1M"/>
</filters>
id: The value storage is a unique
identifier, used for linking with properties stored in workspace
container.
path: A location where value files will
be stored.
Each file value storage can have the filter(s)
for incoming values. A filter can match values by property type
(property-type), property name
(property-name), ancestor path
(ancestor-path) and/or size of values stored
(min-value-size, in bytes). In code sample, we use a
filter with property-type and min-value-size only. For example, storage for binary
values with size greater of 1MB. It is recommended to store properties with
large values in file value storage only.
Another example shows a value storage with different locations for
large files (min-value-size: a 20Mb-sized filter). A
value storage uses ORed logic in the process of filter selection. That
means the first filter in the list will be asked first and if not matched
the next will be called. Here is a value which matches with the min-value-size 20 MB-sized filter
and will be stored in the
"data/20Mvalues" path, all others in "data/values".
<value-storages>
<value-storage id="Storage #1" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage">
<properties>
<property name="path" value="data/20Mvalues"/>
</properties>
<filters>
<filter property-type="Binary" min-value-size="20M"/>
</filters>
<value-storage>
<value-storage id="Storage #2" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage">
<properties>
<property name="path" value="data/values"/>
</properties>
<filters>
<filter property-type="Binary" min-value-size="1M"/>
</filters>
<value-storage>
<value-storages>
It is not recommended to use in production due to low capacity capabilities on most file systems.
However, if you are sure that your file-system or data amount is small, it may be useful for you as it will increase the speed of Value removal.
Hold Values in flat FileSystem files. The path property points to the root directory to store files.
<value-storage id="Storage #1" class="org.exoplatform.services.jcr.impl.storage.value.fs.SimpleFileValueStorage">
<properties>
<property name="path" value="data/values"/>
</properties>
<filters>
<filter property-type="Binary" min-value-size="1M"/>
</filters>
JCR supports Content-addressable storage feature for Values storing.
Content-addressable storage, also referred to as associative storage and abbreviated CAS, is a mechanism for storing information that can be retrieved based on its content, not its storage location. It is typically used for high-speed storage and retrieval of fixed content, such as documents stored for compliance with government regulations.
Content Addressable Value storage stores unique content once. Different properties (values) with the same content will be stored as one data file shared between those values. You can tell the Value content will be shared across some Values in storage and will be stored on one physical file.
Storage size will be decreased for application which governs potentially same data in the content.
For example: if you have 100 different properties containing the same data (for example: mail attachment), the storage stores only one single file. The file will be shared with all referencing properties.
If property Value changes, it is stored in an additional file. Alternatively, the file is shared with other values, pointing to the same content.
The storage calculates Value content address each time the property is changed. CAS write operations are much more expensive compared to the non-CAS storages.
Content address calculation is based on the
java.security.MessageDigest
hash computation and tested with the
MD5
and
SHA1
algorithms.
CAS storage works most efficiently on data that does not change often. For data that changes frequently, CAS is not as efficient as location-based addressing.
CAS support can be enabled for Tree and Simple File Value Storage types.
To enable CAS support, just configure it in JCR Repositories configuration as you do for other Value Storages.
<workspaces>
<workspace name="ws">
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer">
<properties>
<property name="source-name" value="jdbcjcr"/>
<property name="dialect" value="oracle"/>
<property name="multi-db" value="false"/>
<property name="update-storage" value="false"/>
<property name="max-buffer-size" value="200k"/>
<property name="swap-directory" value="target/temp/swap/ws"/>
</properties>
<value-storages>
<!------------------- here ----------------------->
<value-storage id="ws" class="org.exoplatform.services.jcr.impl.storage.value.fs.CASableTreeFileValueStorage">
<properties>
<property name="path" value="target/temp/values/ws"/>
<property name="digest-algo" value="MD5"/>
<property name="vcas-type" value="org.exoplatform.services.jcr.impl.storage.value.cas.JDBCValueContentAddressStorageImpl"/>
<property name="jdbc-source-name" value="jdbcjcr"/>
<property name="jdbc-dialect" value="oracle"/>
</properties>
<filters>
<filter property-type="Binary"/>
</filters>
</value-storage>
</value-storages>
![]() |
|
![]() |
|
![]() |
|
![]() |
|