Class WhitespaceTokenizer

java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.CharTokenizer
org.exoplatform.services.jcr.analyzer.WhitespaceTokenizer
All Implemented Interfaces:
Closeable, AutoCloseable

public class WhitespaceTokenizer extends org.apache.lucene.analysis.CharTokenizer
Created by The eXo Platform SAS Author : eXoPlatform exo@exoplatform.com Apr 9, 2013 A WhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens.
You must specify the required Version compatibility when creating WhitespaceTokenizer:
  • As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See CharTokenizer.isTokenChar(int) and CharTokenizer.normalize(int) for details.
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

    org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
  • Field Summary

    Fields inherited from class org.apache.lucene.analysis.Tokenizer

    input
  • Constructor Summary

    Constructors
    Constructor
    Description
    WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion, Reader in)
    Construct a new WhitespaceTokenizer.
    WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion, org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader in)
    Construct a new WhitespaceTokenizer using a given AttributeSource.AttributeFactory.
    WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion, org.apache.lucene.util.AttributeSource source, Reader in)
    Construct a new WhitespaceTokenizer using a given AttributeSource.
  • Method Summary

    Modifier and Type
    Method
    Description
    protected boolean
    isTokenChar(int c)
    Collects only characters which do not satisfy Character.isWhitespace(int).

    Methods inherited from class org.apache.lucene.analysis.CharTokenizer

    end, incrementToken, isTokenChar, normalize, normalize, reset

    Methods inherited from class org.apache.lucene.analysis.Tokenizer

    close, correctOffset

    Methods inherited from class org.apache.lucene.analysis.TokenStream

    reset

    Methods inherited from class org.apache.lucene.util.AttributeSource

    addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • WhitespaceTokenizer

      public WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion, Reader in)
      Construct a new WhitespaceTokenizer. * @param matchVersion Lucene version to match
      Parameters:
      in - the input to split up into tokens
    • WhitespaceTokenizer

      public WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion, org.apache.lucene.util.AttributeSource source, Reader in)
      Construct a new WhitespaceTokenizer using a given AttributeSource.
      Parameters:
      matchVersion - Lucene version to match
      source - the attribute source to use for this Tokenizer
      in - the input to split up into tokens
    • WhitespaceTokenizer

      public WhitespaceTokenizer(org.apache.lucene.util.Version matchVersion, org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader in)
      Construct a new WhitespaceTokenizer using a given AttributeSource.AttributeFactory.
      Parameters:
      matchVersion - Lucene version to match See <a href="#version">above</a>
      factory - the attribute factory to use for this Tokenizer
      in - the input to split up into tokens
  • Method Details

    • isTokenChar

      protected boolean isTokenChar(int c)
      Collects only characters which do not satisfy Character.isWhitespace(int).
      Overrides:
      isTokenChar in class org.apache.lucene.analysis.CharTokenizer