| 12345678910111213141516171819202122 |
- Apache Solr Language Identifier
- Introduction
- ------------
- This module is intended to be used while indexing documents.
- It is implemented as an UpdateProcessor to be placed in an UpdateChain.
- Its purpose is to identify language from documents and tag the document with language code.
- The module can optionally map field names to their language specific counterpart,
- e.g. if the input is "title" and language is detected as "en", map to "title_en".
- Language may be detected globally for the document, and/or individually per field.
- Language detector implementations are pluggable.
- Getting Started
- ---------------
- Please refer to the module documentation at http://wiki.apache.org/solr/LanguageDetection
- Dependencies
- ------------
- The Tika detector depends on Tika Core (which is part of extraction contrib)
- The Langdetect detector depends on LangDetect library
- The OpenNLP detector depends on OpenNLP tools and requires a previously trained user-supplied model
|