Question

Я пытаюсь написать простую программу, используя Lucene 2.9.4, которая ищет запрос фразы, но я получаю 0 хитов

public class HelloLucene {

public static void main(String[] args) throws IOException, ParseException{
    // TODO Auto-generated method stub

    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
    Directory index = new RAMDirectory();

    IndexWriter w = new IndexWriter(index,analyzer,true,IndexWriter.MaxFieldLength.UNLIMITED);
    addDoc(w, "Lucene in Action");
    addDoc(w, "Lucene for Dummies");
    addDoc(w, "Managing Gigabytes");
    addDoc(w, "The Art of Computer Science");
    w.close();      

    PhraseQuery pq = new PhraseQuery();
    pq.add(new Term("content", "lucene"),0);
    pq.add(new Term("content", "in"),1);
    pq.setSlop(0);

    int hitsPerPage = 10;
    IndexSearcher searcher = new IndexSearcher(index,true);
    TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
    searcher.search(pq, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;

    System.out.println("Found " + hits.length + " hits.");
    for(int i=0; i<hits.length; i++){
        int docId = hits[i].doc;
        Document d = searcher.doc(docId);
        System.out.println((i+1)+ "." + d.get("content"));
    }

    searcher.close();


}

public static void addDoc(IndexWriter w, String value)throws IOException{
    Document doc = new Document();
    doc.add(new Field("content", value, Field.Store.YES, Field.Index.NOT_ANALYZED));
    w.addDocument(doc);
}

}

Скажите, пожалуйста, что не так,Я также пытался использовать QueryParser следующим образом

String querystr ="\"Lucene in Action\"";

    Query q = new QueryParser(Version.LUCENE_29, "content",analyzer).parse(querystr);

Но это также не работает.

David C · Answer 1 · 15 мая 2012

Есть две проблемы с кодом (и они не имеют никакого отношения к вашей версии Lucene):

1) StandardAnalyzer не индексирует стоп-слова (например, «in»), поэтому PhraseQuery никогда не сможет найти фразу «Lucene in»

2) как уже упоминалось Xodarap и Shashikant Kore, ваш призыв к созданию документа должен включать Index.ANALYZED, в противном случае Lucene не использует Analyzer в этом разделе документа. Вероятно, есть отличный способ сделать это с помощью Index.NOT_ANALYZED, но я не знаком с этим.

Для удобного исправления измените метод addDoc на:

public static void addDoc(IndexWriter w, String value)throws IOException{
    Document doc = new Document();
    doc.add(new Field("content", value, Field.Store.YES, Field.Index.ANALYZED));
    w.addDocument(doc);
}

и измените свое создание PhraseQuery на:

    PhraseQuery pq = new PhraseQuery();
    pq.add(new Term("content", "computer"),0);
    pq.add(new Term("content", "science"),1);
    pq.setSlop(0);

Это даст вам результат ниже, так как "компьютер" и "наука" не являются стоп-словами:

    Found 1 hits.
    1.The Art of Computer Science

Если вы хотите найти «Lucene in Action», вы можете увеличить отставание этого PhraseQuery (увеличивая «разрыв» между двумя словами):

    PhraseQuery pq = new PhraseQuery();
    pq.add(new Term("content", "lucene"),0);
    pq.add(new Term("content", "action"),1);
    pq.setSlop(1);

Если вы действительно хотите найти предложение «lucene in», вам нужно будет выбрать другой анализатор (например, SimpleAnalyzer ). В Lucene 2.9 просто замените ваш вызов на StandardAnalyzer на:

    SimpleAnalyzer analyzer = new SimpleAnalyzer();

Или, если вы используете версию 3.1 или выше, вам нужно добавить информацию о версии:

    SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);

Вот полезный пост по аналогичной проблеме (это поможет вам начать работу с PhraseQuery): Точный поиск по фразе с использованием Lucene? - см. Ответ WhiteFang34.

Shashikant Kore · Answer 2 · 15 сентября 2011

Необходимо проанализировать поле, а также включить векторы терминов.

doc.add(new Field("content", value, Field.Store.YES, Field.Index.ANALYZED,  Field.TermVector.YES));

Вы можете отключить сохранение, если вы не планируете извлекать это поле из индекса.

alvas · Answer 3 · 04 января 2012

Это моё решение с Lucene Version.LUCENE_35. Он также называется Lucene 3.5.0 из http://lucene.apache.org/java/docs/releases.html.. Если вы используете IDE, например Eclipse, вы можете добавить файл .jar в путь сборки, это прямая ссылка на файл 3.5.0.jar: http://repo1.maven.org/maven2/org/apache/lucene/lucene-core/3.5.0/lucene-core-3.5.0.jar.

Когда выйдет новая версия Lucene, это решение останется в силе ТОЛЬКО если вы продолжите использовать 3.5.0.jar.

Теперь для кода:

import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;

public class Index {
public static void main(String[] args) throws IOException, ParseException {
  // To store the Lucene index in RAM
    Directory directory = new RAMDirectory();
    // To store the Lucene index in your harddisk, you can use:
    //Directory directory = FSDirectory.open("/foo/bar/testindex");

    // Set the analyzer that you want to use for the task.
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
    // Creating Lucene Index; note, the new version demands configurations.
    IndexWriterConfig config = new IndexWriterConfig(
            Version.LUCENE_35, analyzer);  
    IndexWriter writer = new IndexWriter(directory, config);
    // Note: There are other ways of initializing the IndexWriter.
    // (see http://lucene.apache.org/java/3_5_0/api/all/org/apache/lucene/index/IndexWriter.html)

    // The new version of Documents.add in Lucene requires a Field argument,
    //  and there are a few ways of calling the Field constructor.
    //  (see http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/document/Field.html)
    // Here I just use one of the Field constructor that takes a String parameter.
    List<Document> docs = new ArrayList<Document>();
    Document doc1 = new Document();
    doc1.add(new Field("content", "Lucene in Action", 
        Field.Store.YES, Field.Index.ANALYZED));
    Document doc2 = new Document();
    doc2.add(new Field("content", "Lucene for Dummies", 
        Field.Store.YES, Field.Index.ANALYZED));
    Document doc3 = new Document();
    doc3.add(new Field("content", "Managing Gigabytes", 
        Field.Store.YES, Field.Index.ANALYZED));
    Document doc4 = new Document();
    doc4.add(new Field("content", "The Art of Lucene", 
        Field.Store.YES, Field.Index.ANALYZED));

    docs.add(doc1); docs.add(doc2); docs.add(doc3); docs.add(doc4);

    writer.addDocuments(docs);
    writer.close();

    // To enable query/search, we need to initialize 
    //  the IndexReader and IndexSearcher.
    // Note: The IndexSearcher in Lucene 3.5.0 takes an IndexReader parameter
    //  instead of a Directory parameter.
    IndexReader iRead = IndexReader.open(directory);
    IndexSearcher iSearch = new IndexSearcher(iRead);

    // Parse a simple query that searches for the word "lucene".
    // Note: you need to specify the fieldname for the query 
    // (in our case it is "content").
    QueryParser parser = new QueryParser(Version.LUCENE_35, "content", analyzer);
    Query query = parser.parse("lucene in");

    // Search the Index with the Query, with max 1000 results
    ScoreDoc[] hits = iSearch.search(query, 1000).scoreDocs;

    // Iterate through the search results
    for (int i=0; i<hits.length;i++) {
        // From the indexSearch, we retrieve the search result individually
        Document hitDoc = iSearch.doc(hits[i].doc);
        // Specify the Field type of the retrieved document that you want to print.
        // In our case we only have 1 Field i.e. "content".
        System.out.println(hitDoc.get("content"));
    }
    iSearch.close(); iRead.close(); directory.close();
}   
}

запрос фразы Lucene не работает

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 3 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

запрос фразы Lucene не работает

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 3 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Нет похожих вопросов