lucene

Lucene indexwriter example

In this Example , we are going to learn about Lucene indexwriter class. Here, we go through the simple and fundamental use with the IndexWriter Class.

This simple demonstration goes through the indexing, writing, searching and displaying steps for the indexing example.Thus, this post aims to demonstrate you with a simple demonstration for use of IndexWriter class from lucene.

The code in this example is developed in the NetBeans IDE 8.0.2.

In this example, the lucene used is lucene version 4.2.1. You would better try this one with the latest versions.

Figure 1. Lucene Library Jars
Figure 1. Lucene Library Jars

1. IndexWriter Class

IndexWriter Class is the basic Class defined in Lucene Core particularly specialized for direct use for creating index and maintaining the index.Different methods are available in the IndexWriter Class so that we can easily go with the indexing tasks.

Apache Lucene is an open-source search-support project recently working under Lucene core , solr , pyLucene and open revelence project. Talking about Lucence core, it is particularly aimed to provide Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.

The prime key for indexing and search technology in lucene is to go with indexing using index directory.

2. Here we go

Initiallially, we start with a StandardAnalyzer instance in our lucene demo. Note: You need to import “lucene-analyzers-common-4.2.1.jar” to use StandardAnalyzer.

Initializing StandardAnalyzer

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_42);
//creates an StandardAnalyzer object

2.1. Indexing

You can create an index Directory and configure it with the analyzer instance. You can also give the file path to assign as index directory (Must in case of larger data scenario).

Indexing

Directory index = new RAMDirectory();
//Directory index = FSDirectory.open(new File("index-dir"));
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_42, analyzer);
IndexWriter writer = new IndexWriter(index, config);

Then you can create a writer object using the index directory and IndexWriterConfig objects. For good programming practices , never forget to close the writer upon completion of writer task. This completes the indexing process. Look at last to see the defination with addDoc function.

Writing to index

addDoc(writer, "Day first : Lucence Introduction.", "3436NRX");
addDoc(writer, "Day second , part one : Lucence Projects.", "3437RJ1");
addDoc(writer, "Day second , part two: Lucence Uses.", "3437RJ2");
addDoc(writer, "Day third : Lucence Demos.", "34338KRX");
writer.close();

2.2. Quering

Second task with the example is going with a query string for our seraching task. For quering we use Query parser for our query string using the same analyzer. Nextly, we create indexreader and index searcher for our index directory using a index searcher object. Finally, we collect the search results using TopScoreDocCollector into the array of ScoreDoc. The same array can be used to display the results to user with a proper user interface as needed.

Creating QueryString

String querystr = "Second";
Query q = new QueryParser(Version.LUCENE_42, "title", analyzer).parse(querystr);

2.3. Searching

int hitsPerPage = 10;
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

2.4. Displaying results

Displaying results

System.out.println("Query string: " + querystr );
System.out.println("Found " + hits.length + " hits.");
for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i + 1) + ". " + d.get("course_code") + "\t" + d.get("title"));
}// Finally , close reader

Instead of lengthly process of adding each new entry, we can create a generic fuction to add the new entry doc . We can add needed fields with field variable and respective tag .

addDoc Function

private static void addDoc(IndexWriter w, String title, String courseCode) throws IOException {
Document doc = new Document();
doc.add(new TextField("title", title, Field.Store.YES));
// Here, we use a string field for course_code to avoid tokenizing.
doc.add(new StringField("course_code", courseCode, Field.Store.YES));
w.addDocument(doc);
}

Finally we completed a simple demonstration with this example.

3. Some other important Methods

  • void commit() : Commit all pending changes and synchronize
  • void deleteAll() : Delete all the document of the index
  • Analyzer getAnalyzer() : Returns the current Analyzer
  • Directory getDirectory() : Returns index Directory
  • int numDocs() : Return number of Document to the index including the pending one too
  • void rollback() : Close the indexWriter without committing the pending changes
  • void waitForMerges() : Waits until the left out merges are done

You can try out rest of the methods from the API Documentation itself.

4. Things to consider

  1. Always remember to close IndexWriter. Cause: Leaving the IndexWriter Open still implies that recently added documents are not commited or indexed into the index folder.
  2. Not Analyzed : is not broken down into individual tokens. It should match exactly with query string.
  3. You need to include both jar files of lucene-analyzers-common-x.x.x and lucene-queryparser-x.x.x along with lucene-core jar files to go with above examples.

5. Download the NetBeans project

This was an example about Lucene indexwriter.

Download
You can download the full source code of this example here: Lucene IndexWriter Example

Niranjan Acharya

I am a Software Engineering Graduate from Gandaki College of Engineering and Science, Nepal. I have been involving onto different software activities and projects in the four-year tenure. I started with programming in C and C++. I presented some presentations and exhibitions with C games and allegro gaming in GCES IT Mohatsav. I participated in different academic activities for working with Java, Web Technologies, Enterprise application and Big Data Technologies. With the completion of my Software engineering Graduation, I am working as Chief Technical officer in IT Sahayatri Private Limited.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button