Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.ar |
Analyzer for Arabic.
|
org.apache.lucene.analysis.br |
Analyzer for Brazilian Portuguese.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).
|
org.apache.lucene.analysis.cn |
Analyzer for Chinese, which indexes unigrams (individual chinese characters).
|
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words.
|
org.apache.lucene.analysis.compound |
A filter that decomposes compound words you find in many Germanic
languages into the word parts.
|
org.apache.lucene.analysis.de |
Analyzer for German.
|
org.apache.lucene.analysis.el |
Analyzer for Greek.
|
org.apache.lucene.analysis.fa |
Analyzer for Persian.
|
org.apache.lucene.analysis.fr |
Analyzer for French.
|
org.apache.lucene.analysis.miscellaneous |
Miscellaneous TokenStreams
|
org.apache.lucene.analysis.ngram |
Character n-gram tokenizers and filters.
|
org.apache.lucene.analysis.nl |
Analyzer for Dutch.
|
org.apache.lucene.analysis.payloads |
Provides various convenience classes for creating payloads on Tokens.
|
org.apache.lucene.analysis.position |
Filter for assigning position increments.
|
org.apache.lucene.analysis.reverse |
Filter to reverse token text.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.shingle |
Word n-gram filters
|
org.apache.lucene.analysis.sinks |
Implementations of the SinkTokenizer that might be useful.
|
org.apache.lucene.analysis.snowball |
TokenFilter and Analyzer implementations that use Snowball
stemmers. |
org.apache.lucene.analysis.standard |
A fast grammar-based tokenizer constructed with JFlex.
|
org.apache.lucene.analysis.th |
Analyzer for Thai.
|
org.apache.lucene.collation |
CollationKeyFilter and ICUCollationKeyFilter
convert each token into its binary CollationKey using the
provided Collator , and then encode the CollationKey
as a String using
IndexableBinaryStringTools , to allow it to be
stored as an index term. |
org.apache.lucene.index |
Code to maintain and access indices.
|
org.apache.lucene.index.memory |
High-performance single-document main memory Apache Lucene fulltext search index.
|
org.apache.lucene.queryParser.core.config |
Contains the base classes used to configure the query processing
|
org.apache.lucene.queryParser.standard.config |
Standard Lucene Query Configuration
|
org.apache.lucene.util |
Some utility classes.
|
org.apache.lucene.wikipedia.analysis |
Tokenizer that is aware of Wikipedia syntax.
|
Modifier and Type | Class and Description |
---|---|
class |
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CachingTokenFilter
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.
|
class |
ISOLatin1AccentFilter
Deprecated.
in favor of
ASCIIFoldingFilter which covers a superset
of Latin 1. This class will be removed in Lucene 3.0. |
class |
KeywordTokenizer
Emits the entire input as a single token.
|
class |
LengthFilter
Removes words that are too long or too short from the stream.
|
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
class |
LowerCaseFilter
Normalizes token text to lower case.
|
class |
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together.
|
class |
NumericTokenStream
Expert: This class provides a
TokenStream
for indexing numeric values that can be used by NumericRangeQuery or NumericRangeFilter . |
class |
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.
|
class |
SinkTokenizer
Deprecated.
Use
TeeSinkTokenFilter instead |
class |
StopFilter
Removes stop words from a token stream.
|
class |
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states
that have already been analyzed.
|
static class |
TeeSinkTokenFilter.SinkTokenStream |
class |
TeeTokenFilter
Deprecated.
Use
TeeSinkTokenFilter instead |
class |
TokenFilter
A TokenFilter is a TokenStream whose input is another TokenStream.
|
class |
Tokenizer
A Tokenizer is a TokenStream whose input is a Reader.
|
class |
TokenStream
|
class |
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
Modifier and Type | Method and Description |
---|---|
abstract boolean |
TeeSinkTokenFilter.SinkFilter.accept(AttributeSource source)
Returns true, iff the current state of the passed-in
AttributeSource shall be stored
in the sink. |
Constructor and Description |
---|
CharTokenizer(AttributeSource source,
Reader input) |
KeywordTokenizer(AttributeSource source,
Reader input,
int bufferSize) |
LetterTokenizer(AttributeSource source,
Reader in)
Construct a new LetterTokenizer using a given
AttributeSource . |
LowerCaseTokenizer(AttributeSource source,
Reader in)
Construct a new LowerCaseTokenizer using a given
AttributeSource . |
NumericTokenStream(AttributeSource source,
int precisionStep)
Expert: Creates a token stream for numeric values with the specified
precisionStep using the given AttributeSource . |
Tokenizer(AttributeSource source)
Construct a token stream processing the given input using the given AttributeSource.
|
Tokenizer(AttributeSource source,
Reader input)
Construct a token stream processing the given input using the given AttributeSource.
|
TokenStream(AttributeSource input)
A TokenStream that uses the same attributes as the supplied one.
|
WhitespaceTokenizer(AttributeSource source,
Reader in)
Construct a new WhitespaceTokenizer using a given
AttributeSource . |
Modifier and Type | Class and Description |
---|---|
class |
ArabicLetterTokenizer
Tokenizer that breaks text into runs of letters and diacritics.
|
class |
ArabicNormalizationFilter
A
TokenFilter that applies ArabicNormalizer to normalize the orthography. |
class |
ArabicStemFilter
A
TokenFilter that applies ArabicStemmer to stem Arabic words.. |
Constructor and Description |
---|
ArabicLetterTokenizer(AttributeSource source,
Reader in) |
Modifier and Type | Class and Description |
---|---|
class |
BrazilianStemFilter
A
TokenFilter that applies BrazilianStemmer . |
Modifier and Type | Class and Description |
---|---|
class |
CJKTokenizer
CJKTokenizer is designed for Chinese, Japanese, and Korean languages.
|
Constructor and Description |
---|
CJKTokenizer(AttributeSource source,
Reader in) |
Modifier and Type | Class and Description |
---|---|
class |
ChineseFilter
A
TokenFilter with a stop word table. |
class |
ChineseTokenizer
Tokenize Chinese text as individual chinese characters.
|
Constructor and Description |
---|
ChineseTokenizer(AttributeSource source,
Reader in) |
Modifier and Type | Class and Description |
---|---|
class |
SentenceTokenizer
Tokenizes input text into sentences.
|
class |
WordTokenFilter
A
TokenFilter that breaks sentences into words. |
Constructor and Description |
---|
SentenceTokenizer(AttributeSource source,
Reader reader) |
Modifier and Type | Class and Description |
---|---|
class |
CompoundWordTokenFilterBase
Base class for decomposition token filters.
|
class |
DictionaryCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
class |
HyphenationCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
Modifier and Type | Class and Description |
---|---|
class |
GermanStemFilter
A
TokenFilter that stems German words. |
Modifier and Type | Class and Description |
---|---|
class |
GreekLowerCaseFilter
Normalizes token text to lower case, analyzing given ("greek") charset.
|
Modifier and Type | Class and Description |
---|---|
class |
PersianNormalizationFilter
A
TokenFilter that applies PersianNormalizer to normalize the
orthography. |
Modifier and Type | Class and Description |
---|---|
class |
ElisionFilter
Removes elisions from a
TokenStream . |
class |
FrenchStemFilter
A
TokenFilter that stems french words. |
Modifier and Type | Class and Description |
---|---|
class |
EmptyTokenStream
An always exhausted token stream.
|
class |
PrefixAndSuffixAwareTokenFilter
Links two
PrefixAwareTokenFilter . |
class |
PrefixAwareTokenFilter
Joins two token streams and leaves the last token of the first stream available
to be used when updating the token values in the second stream based on that token.
|
class |
SingleTokenTokenStream
A
TokenStream containing a single token. |
Modifier and Type | Class and Description |
---|---|
class |
EdgeNGramTokenFilter
Tokenizes the given token into n-grams of given size(s).
|
class |
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).
|
class |
NGramTokenFilter
Tokenizes the input into n-grams of the given size(s).
|
class |
NGramTokenizer
Tokenizes the input into n-grams of the given size(s).
|
Constructor and Description |
---|
EdgeNGramTokenizer(AttributeSource source,
Reader input,
EdgeNGramTokenizer.Side side,
int minGram,
int maxGram)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
|
EdgeNGramTokenizer(AttributeSource source,
Reader input,
String sideLabel,
int minGram,
int maxGram)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
|
NGramTokenizer(AttributeSource source,
Reader input,
int minGram,
int maxGram)
Creates NGramTokenizer with given min and max n-grams.
|
Modifier and Type | Class and Description |
---|---|
class |
DutchStemFilter
A
TokenFilter that stems Dutch words. |
Modifier and Type | Class and Description |
---|---|
class |
DelimitedPayloadTokenFilter
Characters before the delimiter are the "token", those after are the payload.
|
class |
NumericPayloadTokenFilter
Assigns a payload to a token based on the
Token.type() |
class |
TokenOffsetPayloadTokenFilter
Adds the
Token.setStartOffset(int)
and Token.setEndOffset(int)
First 4 bytes are the start |
class |
TypeAsPayloadTokenFilter
Makes the
Token.type() a payload. |
Modifier and Type | Class and Description |
---|---|
class |
PositionFilter
Set the positionIncrement of all tokens to the "positionIncrement",
except the first return token which retains its original positionIncrement value.
|
Modifier and Type | Class and Description |
---|---|
class |
ReverseStringFilter
Reverse token string, for example "country" => "yrtnuoc".
|
Modifier and Type | Class and Description |
---|---|
class |
RussianLetterTokenizer
A RussianLetterTokenizer is a
Tokenizer that extends LetterTokenizer
by additionally looking up letters in a given "russian charset". |
class |
RussianLowerCaseFilter
Normalizes token text to lower case, analyzing given ("russian") charset.
|
class |
RussianStemFilter
A
TokenFilter that stems Russian words. |
Constructor and Description |
---|
RussianLetterTokenizer(AttributeSource source,
Reader in) |
Modifier and Type | Class and Description |
---|---|
class |
ShingleFilter
A ShingleFilter constructs shingles (token n-grams) from a token stream.
|
class |
ShingleMatrixFilter
A ShingleMatrixFilter constructs shingles (token n-grams) from a token stream.
|
Modifier and Type | Class and Description |
---|---|
class |
DateRecognizerSinkTokenizer
Deprecated.
Use
DateRecognizerSinkFilter and TeeSinkTokenFilter instead. |
class |
TokenRangeSinkTokenizer
Deprecated.
Use
TokenRangeSinkFilter and TeeSinkTokenFilter instead. |
class |
TokenTypeSinkTokenizer
Deprecated.
Use
TokenTypeSinkFilter and TeeSinkTokenFilter instead. |
Modifier and Type | Method and Description |
---|---|
boolean |
TokenRangeSinkFilter.accept(AttributeSource source) |
boolean |
DateRecognizerSinkFilter.accept(AttributeSource source) |
boolean |
TokenTypeSinkFilter.accept(AttributeSource source) |
Modifier and Type | Class and Description |
---|---|
class |
SnowballFilter
A filter that stems words using a Snowball-generated stemmer.
|
Modifier and Type | Class and Description |
---|---|
class |
StandardFilter
Normalizes tokens extracted with
StandardTokenizer . |
class |
StandardTokenizer
A grammar-based tokenizer constructed with JFlex
|
Constructor and Description |
---|
StandardTokenizer(AttributeSource source,
Reader input,
boolean replaceInvalidAcronym)
Deprecated.
|
StandardTokenizer(Version matchVersion,
AttributeSource source,
Reader input)
Creates a new StandardTokenizer with a given
AttributeSource . |
Modifier and Type | Class and Description |
---|---|
class |
ThaiWordFilter
TokenFilter that use BreakIterator to break each
Token that is Thai into separate Token(s) for each Thai word. |
Modifier and Type | Class and Description |
---|---|
class |
CollationKeyFilter
Converts each token into its
CollationKey , and then
encodes the CollationKey with IndexableBinaryStringTools , to allow
it to be stored as an index term. |
class |
ICUCollationKeyFilter
Converts each token into its
CollationKey , and
then encodes the CollationKey with IndexableBinaryStringTools , to
allow it to be stored as an index term. |
Modifier and Type | Method and Description |
---|---|
AttributeSource |
FieldInvertState.getAttributeSource() |
Modifier and Type | Class and Description |
---|---|
class |
SynonymTokenFilter
Injects additional tokens for synonyms of token terms fetched from the
underlying child stream; the child stream must deliver lowercase tokens
for synonyms to be found.
|
Modifier and Type | Class and Description |
---|---|
class |
FieldConfig
This class represents a field configuration.
|
class |
QueryConfigHandler
This class can be used to hold any query configuration and no field
configuration.
|
Modifier and Type | Class and Description |
---|---|
class |
StandardQueryConfigHandler
This query configuration handler is used for almost every processor defined
in the
StandardQueryNodeProcessorPipeline processor pipeline. |
Modifier and Type | Method and Description |
---|---|
AttributeSource |
AttributeSource.cloneAttributes()
Performs a clone of all
AttributeImpl instances returned in a new
AttributeSource instance. |
Constructor and Description |
---|
AttributeSource(AttributeSource input)
An AttributeSource that uses the same attributes as the supplied one.
|
Modifier and Type | Class and Description |
---|---|
class |
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax.
|
Constructor and Description |
---|
WikipediaTokenizer(AttributeSource source,
Reader input,
int tokenOutput,
Set untokenizedTypes)
Creates a new instance of the
WikipediaTokenizer . |
Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.