JCR supports such features as Lucene Fuzzy Searches Apache Lucene - Query Parser Syntax.

To use it, you have to form a query like the one described below:

QueryManager qman = session.getWorkspace().getQueryManager();
Query q = qman.createQuery("select * from nt:base where contains(field, 'ccccc~')", Query.SQL);
QueryResult res = q.execute();

Searching with synonyms is integrated in the jcr:contains() function and uses the same syntax as synonym searches in Google. If a search term is prefixed by a tilde symbol ( ~ ), also synonyms of the search term are taken into consideration. For example:

SQL: select * from nt:resource where contains(., '~parameter')

XPath: //element(*, nt:resource)[jcr:contains(., '~parameter')

This feature is disabled by default and you need to add a configuration parameter to the query-handler element in your jcr configuration file to enable it.

<param  name="synonymprovider-config-path" value="..you path to configuration file....."/>
<param  name="synonymprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider"/>
/**
 * <code>SynonymProvider</code> defines an interface for a component that
 * returns synonyms for a given term.
 */
public interface SynonymProvider {

   /**
    * Initializes the synonym provider and passes the file system resource to
    * the synonym provider configuration defined by the configuration value of
    * the <code>synonymProviderConfigPath</code> parameter. The resource may be
    * <code>null</code> if the configuration parameter is not set.
    *
    * @param fsr the file system resource to the synonym provider
    *            configuration.
    * @throws IOException if an error occurs while initializing the synonym
    *                     provider.
    */
   public void initialize(InputStream fsr) throws IOException;

   /**
    * Returns an array of terms that are considered synonyms for the given
    * <code>term</code>.
    *
    * @param term a search term.
    * @return an array of synonyms for the given <code>term</code> or an empty
    *         array if no synonyms are known.
    */
   public String[] getSynonyms(String term);
}

An ExcerptProvider retrieves text excerpts for a node in the query result and marks up the words in the text that match the query terms.

By default highlighting words matched the query is disabled because this feature requires that additional information is written to the search index. To enable this feature, you need to add a configuration parameter to the query-handler element in your jcr configuration file to enable it.

<param name="support-highlighting" value="true"/>

Additionally, there is a parameter that controls the format of the excerpt created. In JCR 1.9, the default is set to org.exoplatform.services.jcr.impl.core.query.lucene.DefaultHTMLExcerpt. The configuration parameter for this setting is:

<param name="excerptprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.DefaultXMLExcerpt"/>

The lucene based query handler implementation supports a pluggable spell checker mechanism. By default, spell checking is not available and you have to configure it first. See parameter spellCheckerClass on page Search Configuration. JCR currently provides an implementation class , which uses the lucene-spellchecker to contribute . The dictionary is derived from the fulltext indexed content of the workspace and updated periodically. You can configure the refresh interval by picking one of the available inner classes of org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker:

  • OneMinuteRefreshInterval

  • FiveMinutesRefreshInterval

  • ThirtyMinutesRefreshInterval

  • OneHourRefreshInterval

  • SixHoursRefreshInterval

  • TwelveHoursRefreshInterval

  • OneDayRefreshInterval

For example, if you want a refresh interval of six hours, the class name is: org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker$SixHoursRefreshInterval. If you use org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker, the refresh interval will be one hour.

The spell checker dictionary is stored as a lucene index under "index-dir"/spellchecker. If it does not exist, a background thread will create it on startup. Similarly, the dictionary refresh is also done in a background thread to not block regular queries.