Class WhitespaceTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.CharTokenizer
org.exoplatform.services.jcr.analyzer.WhitespaceTokenizer
- All Implemented Interfaces:
Closeable,AutoCloseable
public class WhitespaceTokenizer
extends org.apache.lucene.analysis.CharTokenizer
Created by The eXo Platform SAS
Author : eXoPlatform
exo@exoplatform.com
Apr 9, 2013
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
Adjacent sequences of non-Whitespace characters form tokens.
You must specify the required
You must specify the required
Version compatibility when creating
WhitespaceTokenizer:
- As of 3.1,
CharTokenizeruses an int based API to normalize and detect token characters. SeeCharTokenizer.isTokenChar(int)andCharTokenizer.normalize(int)for details.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State -
Field Summary
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input -
Constructor Summary
ConstructorsConstructorDescriptionWhitespaceTokenizer(org.apache.lucene.util.Version matchVersion, Reader in) Construct a new WhitespaceTokenizer.WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion, org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader in) Construct a new WhitespaceTokenizer using a givenAttributeSource.AttributeFactory.WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion, org.apache.lucene.util.AttributeSource source, Reader in) Construct a new WhitespaceTokenizer using a givenAttributeSource. -
Method Summary
Modifier and TypeMethodDescriptionprotected booleanisTokenChar(int c) Collects only characters which do not satisfyCharacter.isWhitespace(int).Methods inherited from class org.apache.lucene.analysis.CharTokenizer
end, incrementToken, isTokenChar, normalize, normalize, resetMethods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffsetMethods inherited from class org.apache.lucene.analysis.TokenStream
resetMethods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
Constructor Details
-
WhitespaceTokenizer
Construct a new WhitespaceTokenizer. * @param matchVersion Lucene version to match- Parameters:
in- the input to split up into tokens
-
WhitespaceTokenizer
public WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion, org.apache.lucene.util.AttributeSource source, Reader in) Construct a new WhitespaceTokenizer using a givenAttributeSource.- Parameters:
matchVersion- Lucene version to matchsource- the attribute source to use for thisTokenizerin- the input to split up into tokens
-
WhitespaceTokenizer
public WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion, org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader in) Construct a new WhitespaceTokenizer using a givenAttributeSource.AttributeFactory.- Parameters:
matchVersion- Lucene version to match See <a href="#version">above</a>factory- the attribute factory to use for thisTokenizerin- the input to split up into tokens
-
-
Method Details
-
isTokenChar
protected boolean isTokenChar(int c) Collects only characters which do not satisfyCharacter.isWhitespace(int).- Overrides:
isTokenCharin classorg.apache.lucene.analysis.CharTokenizer
-