3.11.3. Old-style DocumentReaders and Tika Parsers

The following topics are covered:

3.11.3.1. How to make and register own DocumentReader

As you see configuration above, there is both old-style DocumentReaders and new Tika parsers registered.

But MSWordDocumentReader and org.apache.tika.parser.microsoft.OfficeParser both refer to same "application/msword" mimetype, exclaims attentive reader. And he is right. But only one DocumentReader will be fetched.

Old-style DocumentReader registered in configuration become registered into DocumentReaderService. So, mimetypes that is supported by those DocumentReaders will have a registered pair, and user will always fetch this DocumentReaders with getDocumentReader(..) method. Tika configuration will be checked for Parsers only if there is no already registered DocumentReader.

Copyright ©2012. All rights reserved. eXo Platform SAS