In this example, we will create new Analyzer, set it in QueryHandler configuration, and make query to check it.
Standard analyzer does not normalize accents like é,è,à. So, a word like 'tréma' will be stored to index as 'tréma'. But if we want to normalize such symbols or not? We want to store 'tréma' word as 'trema'.
There is two ways of setting up new Analyzer (no matter standarts or our):
The first way: Create descendant class of SearchIndex with new Analyzer (see Search Configuration);
There is only one way - create new Analyzer (if there is no previously created and accepted for our needs) and set it in Search index.
The second way: Register new Analyzer in QueryHandler configuration (this one eccepted since 1.12 version);
We will use the last one:
Create new MyAnalyzer
public class MyAnalyzer extends Analyzer
{
@Override
public TokenStream tokenStream(String fieldName, Reader reader)
{
StandardTokenizer tokenStream = new StandardTokenizer(reader);
// process all text with standard filter
// removes 's (as 's in "Peter's") from the end of words and removes dots from acronyms.
TokenStream result = new StandardFilter(tokenStream);
// this filter normalizes token text to lower case
result = new LowerCaseFilter(result);
// this one replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalents
result = new ISOLatin1AccentFilter(result);
// and finally return token stream
return result;
}
}
Then, register new MyAnalyzer in configuration
<workspace name="ws">
...
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
<properties>
<property name="analyzer" value="org.exoplatform.services.jcr.impl.core.MyAnalyzer"/>
...
</properties>
</query-handler>
...
</workspace>
After that, check it with query:
Find node with mixin type 'mix:title' where 'jcr:title' contains "tréma" and "naïve" strings.