Package | Description |
---|---|
org.apache.lucene.analysis |
Support for testing analysis components.
|
org.apache.lucene.analysis.ar |
Analyzer for Arabic.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams.
|
org.apache.lucene.analysis.cn |
Analyzer for Chinese, which indexes unigrams (individual chinese characters).
|
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words.
|
org.apache.lucene.analysis.core |
Basic, general-purpose analysis components.
|
org.apache.lucene.analysis.icu.segmentation |
Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
|
org.apache.lucene.analysis.in |
Analysis components for Indian languages.
|
org.apache.lucene.analysis.ja |
Analyzer for Japanese.
|
org.apache.lucene.analysis.ngram |
Character n-gram tokenizers and filters.
|
org.apache.lucene.analysis.path |
Analysis components for path-like strings such as filenames.
|
org.apache.lucene.analysis.pattern |
Set of components for pattern-based (regex) analysis.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.standard |
Fast, general-purpose grammar-based tokenizers.
|
org.apache.lucene.analysis.th |
Analyzer for Thai.
|
org.apache.lucene.analysis.uima |
Classes that integrate UIMA with Lucene's analysis API.
|
org.apache.lucene.analysis.util |
Utility functions for text analysis.
|
org.apache.lucene.analysis.wikipedia |
Tokenizer that is aware of Wikipedia syntax.
|
org.apache.lucene.collation |
Unicode Collation support.
|
org.apache.lucene.util |
General test support.
|
Modifier and Type | Field and Description |
---|---|
static AttributeFactory |
TokenStream.DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Default
AttributeFactory instance that should be used for TokenStreams. |
static AttributeFactory |
Token.TOKEN_ATTRIBUTE_FACTORY
Deprecated.
Convenience factory that returns
Token as implementation for the basic
attributes and return the default impl (with "Impl" appended) for all other
attributes. |
static AttributeFactory |
MockUTF16TermAttributeImpl.UTF16_TERM_ATTRIBUTE_FACTORY
Factory that returns an instance of this class for CharTermAttribute
|
Modifier and Type | Method and Description |
---|---|
static AttributeFactory |
BaseTokenStreamTestCase.newAttributeFactory()
Returns a random AttributeFactory impl
|
static AttributeFactory |
BaseTokenStreamTestCase.newAttributeFactory(Random random)
Returns a random AttributeFactory impl
|
Constructor and Description |
---|
MockTokenizer(AttributeFactory factory,
Reader input)
|
MockTokenizer(AttributeFactory factory,
Reader input,
CharacterRunAutomaton runAutomaton,
boolean lowerCase) |
MockTokenizer(AttributeFactory factory,
Reader input,
CharacterRunAutomaton runAutomaton,
boolean lowerCase,
int maxTokenLength) |
NumericTokenStream(AttributeFactory factory,
int precisionStep)
Expert: Creates a token stream for numeric values with the specified
precisionStep using the given
AttributeFactory . |
Tokenizer(AttributeFactory factory,
Reader input)
Construct a token stream processing the given input using the given AttributeFactory.
|
TokenStream(AttributeFactory factory)
A TokenStream using the supplied AttributeFactory for creating new
Attribute instances. |
Modifier and Type | Method and Description |
---|---|
ArabicLetterTokenizer |
ArabicLetterTokenizerFactory.create(AttributeFactory factory,
Reader input)
Deprecated.
|
Constructor and Description |
---|
ArabicLetterTokenizer(Version matchVersion,
AttributeFactory factory,
Reader in)
Deprecated.
Construct a new ArabicLetterTokenizer using a given
AttributeFactory . |
Modifier and Type | Method and Description |
---|---|
CJKTokenizer |
CJKTokenizerFactory.create(AttributeFactory factory,
Reader in)
Deprecated.
|
Constructor and Description |
---|
CJKTokenizer(AttributeFactory factory,
Reader in)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
ChineseTokenizer |
ChineseTokenizerFactory.create(AttributeFactory factory,
Reader in)
Deprecated.
|
Constructor and Description |
---|
ChineseTokenizer(AttributeFactory factory,
Reader in)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
SentenceTokenizer |
SmartChineseSentenceTokenizerFactory.create(AttributeFactory factory,
Reader input)
Deprecated.
|
Tokenizer |
HMMChineseTokenizerFactory.create(AttributeFactory factory,
Reader reader) |
Constructor and Description |
---|
HMMChineseTokenizer(AttributeFactory factory,
Reader reader)
Creates a new HMMChineseTokenizer, supplying the AttributeFactory
|
SentenceTokenizer(AttributeFactory factory,
Reader reader)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
KeywordTokenizer |
KeywordTokenizerFactory.create(AttributeFactory factory,
Reader input) |
LowerCaseTokenizer |
LowerCaseTokenizerFactory.create(AttributeFactory factory,
Reader input) |
LetterTokenizer |
LetterTokenizerFactory.create(AttributeFactory factory,
Reader input) |
WhitespaceTokenizer |
WhitespaceTokenizerFactory.create(AttributeFactory factory,
Reader input) |
Constructor and Description |
---|
KeywordTokenizer(AttributeFactory factory,
Reader input,
int bufferSize) |
LetterTokenizer(AttributeFactory factory,
Reader in)
Construct a new LetterTokenizer using a given
AttributeFactory . |
LetterTokenizer(Version matchVersion,
AttributeFactory factory,
Reader in)
Deprecated.
|
LowerCaseTokenizer(AttributeFactory factory,
Reader in)
Construct a new LowerCaseTokenizer using a given
AttributeFactory . |
LowerCaseTokenizer(Version matchVersion,
AttributeFactory factory,
Reader in)
Deprecated.
|
WhitespaceTokenizer(AttributeFactory factory,
Reader in)
Construct a new WhitespaceTokenizer using a given
AttributeFactory . |
WhitespaceTokenizer(Version matchVersion,
AttributeFactory factory,
Reader in)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
ICUTokenizer |
ICUTokenizerFactory.create(AttributeFactory factory,
Reader input) |
Constructor and Description |
---|
ICUTokenizer(AttributeFactory factory,
Reader input,
ICUTokenizerConfig config)
Construct a new ICUTokenizer that breaks text into words from the given
Reader, using a tailored BreakIterator configuration.
|
Constructor and Description |
---|
IndicTokenizer(Version matchVersion,
AttributeFactory factory,
Reader input)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
JapaneseTokenizer |
JapaneseTokenizerFactory.create(AttributeFactory factory,
Reader input) |
Constructor and Description |
---|
JapaneseTokenizer(AttributeFactory factory,
Reader input,
UserDictionary userDictionary,
boolean discardPunctuation,
JapaneseTokenizer.Mode mode)
Create a new JapaneseTokenizer.
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
EdgeNGramTokenizerFactory.create(AttributeFactory factory,
Reader input) |
Tokenizer |
NGramTokenizerFactory.create(AttributeFactory factory,
Reader input)
|
Constructor and Description |
---|
EdgeNGramTokenizer(AttributeFactory factory,
Reader input,
int minGram,
int maxGram)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
|
EdgeNGramTokenizer(Version version,
AttributeFactory factory,
Reader input,
int minGram,
int maxGram)
Deprecated.
|
Lucene43EdgeNGramTokenizer(AttributeFactory factory,
Reader input,
int minGram,
int maxGram)
Deprecated.
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
|
Lucene43EdgeNGramTokenizer(Version version,
AttributeFactory factory,
Reader input,
int minGram,
int maxGram)
Deprecated.
|
Lucene43EdgeNGramTokenizer(Version version,
AttributeFactory factory,
Reader input,
Lucene43EdgeNGramTokenizer.Side side,
int minGram,
int maxGram)
Deprecated.
|
Lucene43EdgeNGramTokenizer(Version version,
AttributeFactory factory,
Reader input,
String sideLabel,
int minGram,
int maxGram)
Deprecated.
|
Lucene43NGramTokenizer(AttributeFactory factory,
Reader input,
int minGram,
int maxGram)
Deprecated.
Creates NGramTokenizer with given min and max n-grams.
|
NGramTokenizer(AttributeFactory factory,
Reader input,
int minGram,
int maxGram)
Creates NGramTokenizer with given min and max n-grams.
|
NGramTokenizer(Version version,
AttributeFactory factory,
Reader input,
int minGram,
int maxGram)
Deprecated.
For
Version.LUCENE_4_3_0 and before, use Lucene43NGramTokenizer , otherwise use NGramTokenizer(AttributeFactory, Reader, int, int) |
Modifier and Type | Method and Description |
---|---|
Tokenizer |
PathHierarchyTokenizerFactory.create(AttributeFactory factory,
Reader input) |
Constructor and Description |
---|
PathHierarchyTokenizer(AttributeFactory factory,
Reader input,
char delimiter,
char replacement,
int skip) |
PathHierarchyTokenizer(AttributeFactory factory,
Reader input,
int bufferSize,
char delimiter,
char replacement,
int skip) |
ReversePathHierarchyTokenizer(AttributeFactory factory,
Reader input,
char delimiter,
char replacement,
int skip) |
ReversePathHierarchyTokenizer(AttributeFactory factory,
Reader input,
int bufferSize,
char delimiter,
char replacement,
int skip) |
Modifier and Type | Method and Description |
---|---|
PatternTokenizer |
PatternTokenizerFactory.create(AttributeFactory factory,
Reader in)
Split the input using configured pattern
|
Constructor and Description |
---|
PatternTokenizer(AttributeFactory factory,
Reader input,
Pattern pattern,
int group)
creates a new PatternTokenizer returning tokens from group (-1 for split functionality)
|
Modifier and Type | Method and Description |
---|---|
RussianLetterTokenizer |
RussianLetterTokenizerFactory.create(AttributeFactory factory,
Reader in)
Deprecated.
|
Constructor and Description |
---|
RussianLetterTokenizer(Version matchVersion,
AttributeFactory factory,
Reader in)
Deprecated.
Construct a new RussianLetterTokenizer using a given
AttributeFactory . |
Modifier and Type | Method and Description |
---|---|
UAX29URLEmailTokenizer |
UAX29URLEmailTokenizerFactory.create(AttributeFactory factory,
Reader input) |
ClassicTokenizer |
ClassicTokenizerFactory.create(AttributeFactory factory,
Reader input) |
StandardTokenizer |
StandardTokenizerFactory.create(AttributeFactory factory,
Reader input) |
Constructor and Description |
---|
ClassicTokenizer(AttributeFactory factory,
Reader input)
Creates a new ClassicTokenizer with a given
AttributeFactory |
ClassicTokenizer(Version matchVersion,
AttributeFactory factory,
Reader input)
Deprecated.
|
StandardTokenizer(AttributeFactory factory,
Reader input)
Creates a new StandardTokenizer with a given
AttributeFactory |
StandardTokenizer(Version matchVersion,
AttributeFactory factory,
Reader input)
Deprecated.
|
UAX29URLEmailTokenizer(AttributeFactory factory,
Reader input)
Creates a new UAX29URLEmailTokenizer with a given
AttributeFactory |
UAX29URLEmailTokenizer(Version matchVersion,
AttributeFactory factory,
Reader input)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
ThaiTokenizerFactory.create(AttributeFactory factory,
Reader reader) |
Constructor and Description |
---|
ThaiTokenizer(AttributeFactory factory,
Reader reader)
Creates a new ThaiTokenizer, supplying the AttributeFactory
|
Modifier and Type | Method and Description |
---|---|
UIMATypeAwareAnnotationsTokenizer |
UIMATypeAwareAnnotationsTokenizerFactory.create(AttributeFactory factory,
Reader input) |
UIMAAnnotationsTokenizer |
UIMAAnnotationsTokenizerFactory.create(AttributeFactory factory,
Reader input) |
Constructor and Description |
---|
BaseUIMATokenizer(AttributeFactory factory,
Reader reader,
String descriptorPath,
Map<String,Object> configurationParameters) |
UIMAAnnotationsTokenizer(String descriptorPath,
String tokenType,
Map<String,Object> configurationParameters,
AttributeFactory factory,
Reader input) |
UIMATypeAwareAnnotationsTokenizer(String descriptorPath,
String tokenType,
String typeAttributeFeaturePath,
Map<String,Object> configurationParameters,
AttributeFactory factory,
Reader input) |
Modifier and Type | Method and Description |
---|---|
abstract Tokenizer |
TokenizerFactory.create(AttributeFactory factory,
Reader input)
Creates a TokenStream of the specified input using the given AttributeFactory
|
Constructor and Description |
---|
CharTokenizer(AttributeFactory factory,
Reader input)
Creates a new
CharTokenizer instance |
CharTokenizer(Version matchVersion,
AttributeFactory factory,
Reader input)
Deprecated.
|
SegmentingTokenizerBase(AttributeFactory factory,
Reader reader,
BreakIterator iterator)
Construct a new SegmenterBase, also supplying the AttributeFactory
|
Modifier and Type | Method and Description |
---|---|
WikipediaTokenizer |
WikipediaTokenizerFactory.create(AttributeFactory factory,
Reader input) |
Constructor and Description |
---|
WikipediaTokenizer(AttributeFactory factory,
Reader input,
int tokenOutput,
Set<String> untokenizedTypes)
Creates a new instance of the
WikipediaTokenizer . |
Modifier and Type | Class and Description |
---|---|
class |
CollationAttributeFactory
Converts each token into its
CollationKey , and then
encodes the bytes as an index term. |
class |
ICUCollationAttributeFactory
Converts each token into its
CollationKey , and
then encodes bytes as an index term. |
Constructor and Description |
---|
CollationAttributeFactory(AttributeFactory delegate,
Collator collator)
Create a CollationAttributeFactory, using the supplied Attribute Factory
as the factory for all other attributes.
|
ICUCollationAttributeFactory(AttributeFactory delegate,
com.ibm.icu.text.Collator collator)
Create an ICUCollationAttributeFactory, using the supplied Attribute
Factory as the factory for all other attributes.
|
Modifier and Type | Class and Description |
---|---|
static class |
AttributeFactory.StaticImplementationAttributeFactory<A extends AttributeImpl>
Expert: AttributeFactory returning an instance of the given
clazz for the
attributes it implements. |
Modifier and Type | Field and Description |
---|---|
static AttributeFactory |
AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY
This is the default factory that creates
AttributeImpl s using the
class name of the supplied Attribute interface class by appending Impl to it. |
static AttributeFactory |
AttributeSource.DEFAULT_ATTRIBUTE_FACTORY
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
AttributeFactory |
AttributeSource.getAttributeFactory()
returns the used AttributeFactory.
|
static <A extends AttributeImpl> |
AttributeFactory.getStaticImplementation(AttributeFactory delegate,
Class<A> clazz)
Returns an AttributeFactory returning an instance of the given
clazz for the
attributes it implements. |
Modifier and Type | Method and Description |
---|---|
static <A extends AttributeImpl> |
AttributeFactory.getStaticImplementation(AttributeFactory delegate,
Class<A> clazz)
Returns an AttributeFactory returning an instance of the given
clazz for the
attributes it implements. |
Constructor and Description |
---|
AttributeSource(AttributeFactory factory)
An AttributeSource using the supplied
AttributeFactory for creating new Attribute instances. |
StaticImplementationAttributeFactory(AttributeFactory delegate,
Class<A> clazz)
Expert: Creates an AttributeFactory returning
clazz as instance for the
attributes it implements and for all other attributes calls the given delegate factory. |
Copyright © 2000–2022 The Apache Software Foundation. All rights reserved.