How to get a Token from a Lucene TokenStream?(如何从 Lucene TokenStream 中获取 Token?)
问题描述
我正在尝试使用 Apache Lucene 进行标记,我对从 TokenStream
获取令牌的过程感到困惑.
I'm trying to use Apache Lucene for tokenizing, and I am baffled at the process to obtain Tokens from a TokenStream
.
最糟糕的是,我正在查看 JavaDocs 中解决我问题的评论.
The worst part is that I'm looking at the comments in the JavaDocs that address my question.
http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/analysis/TokenStream.html#incrementToken%28%29
不知何故,应该使用 AttributeSource
,而不是 Token
.我完全不知所措.
Somehow, an AttributeSource
is supposed to be used, rather than Token
s. I'm totally at a loss.
谁能解释如何从 TokenStream 中获取类似令牌的信息?
Can anyone explain how to get token-like information from a TokenStream?
推荐答案
是的,这有点复杂(与好方法相比),但应该这样做:
Yeah, it's a little convoluted (compared to the good ol' way), but this should do it:
TokenStream tokenStream = analyzer.tokenStream(fieldName, reader);
OffsetAttribute offsetAttribute = tokenStream.getAttribute(OffsetAttribute.class);
TermAttribute termAttribute = tokenStream.getAttribute(TermAttribute.class);
while (tokenStream.incrementToken()) {
int startOffset = offsetAttribute.startOffset();
int endOffset = offsetAttribute.endOffset();
String term = termAttribute.term();
}
新方式
根据 Donotello 的说法,TermAttribute
已被弃用,取而代之的是 CharTermAttribute
.根据 jpountz(和 Lucene 的文档),addAttribute
比 getAttribute
更可取.
The new way
According to Donotello, TermAttribute
has been deprecated in favor of CharTermAttribute
. According to jpountz (and Lucene's documentation), addAttribute
is more desirable than getAttribute
.
TokenStream tokenStream = analyzer.tokenStream(fieldName, reader);
OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);
CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
while (tokenStream.incrementToken()) {
int startOffset = offsetAttribute.startOffset();
int endOffset = offsetAttribute.endOffset();
String term = charTermAttribute.toString();
}
这篇关于如何从 Lucene TokenStream 中获取 Token?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何从 Lucene TokenStream 中获取 Token?


基础教程推荐
- 如何对 Java Hashmap 中的值求和 2022-01-01
- 无法复制:“比较方法违反了它的一般约定!" 2022-01-01
- Struts2 URL 无法访问 2022-01-01
- REST Web 服务返回 415 - 不支持的媒体类型 2022-01-01
- 存储 20 位数字的数据类型 2022-01-01
- 修改 void 函数的输入参数,然后读取 2022-01-01
- 使用堆栈算法进行括号/括号匹配 2022-01-01
- RabbitMQ:消息保持“未确认"; 2022-01-01
- 问题http://apache.org/xml/features/xinclude测试日志4j 2 2022-01-01
- Spring AOP错误无法懒惰地为此建议构建thisJoinPoin 2022-09-13