Lucene indexwriter example
In this Example , we are going to learn about Lucene indexwriter class. Here, we go through the simple and fundamental use with the IndexWriter Class.
This simple demonstration goes through the indexing, writing, searching and displaying steps for the indexing example.Thus, this post aims to demonstrate you with a simple demonstration for use of IndexWriter class from lucene.
The code in this example is developed in the NetBeans IDE 8.0.2.
In this example, the lucene used is lucene version 4.2.1. You would better try this one with the latest versions.
1. IndexWriter Class
IndexWriter Class is the basic Class defined in Lucene Core particularly specialized for direct use for creating index and maintaining the index.Different methods are available in the IndexWriter Class so that we can easily go with the indexing tasks.
Apache Lucene is an open-source search-support project recently working under Lucene core , solr , pyLucene and open revelence project. Talking about Lucence core, it is particularly aimed to provide Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.
The prime key for indexing and search technology in lucene is to go with indexing using index directory.
2. Here we go
Initiallially, we start with a StandardAnalyzer instance in our lucene demo. Note: You need to import “lucene-analyzers-common-4.2.1.jar” to use StandardAnalyzer.
Initializing StandardAnalyzer
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_42); //creates an StandardAnalyzer object
2.1. Indexing
You can create an index Directory and configure it with the analyzer instance. You can also give the file path to assign as index directory (Must in case of larger data scenario).
Indexing
Directory index = new RAMDirectory(); //Directory index = FSDirectory.open(new File("index-dir")); IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_42, analyzer); IndexWriter writer = new IndexWriter(index, config);
Then you can create a writer object using the index directory and IndexWriterConfig objects. For good programming practices , never forget to close the writer upon completion of writer task. This completes the indexing process. Look at last to see the defination with addDoc function.
Writing to index
addDoc(writer, "Day first : Lucence Introduction.", "3436NRX"); addDoc(writer, "Day second , part one : Lucence Projects.", "3437RJ1"); addDoc(writer, "Day second , part two: Lucence Uses.", "3437RJ2"); addDoc(writer, "Day third : Lucence Demos.", "34338KRX"); writer.close();
2.2. Quering
Second task with the example is going with a query string for our seraching task. For quering we use Query parser for our query string using the same analyzer. Nextly, we create indexreader and index searcher for our index directory using a index searcher object. Finally, we collect the search results using TopScoreDocCollector into the array of ScoreDoc. The same array can be used to display the results to user with a proper user interface as needed.
Creating QueryString
String querystr = "Second"; Query q = new QueryParser(Version.LUCENE_42, "title", analyzer).parse(querystr);
2.3. Searching
int hitsPerPage = 10; IndexReader reader = DirectoryReader.open(index); IndexSearcher searcher = new IndexSearcher(reader); TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true); searcher.search(q, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs;
2.4. Displaying results
Displaying results
System.out.println("Query string: " + querystr ); System.out.println("Found " + hits.length + " hits."); for (int i = 0; i < hits.length; ++i) { int docId = hits[i].doc; Document d = searcher.doc(docId); System.out.println((i + 1) + ". " + d.get("course_code") + "\t" + d.get("title")); }// Finally , close reader
Instead of lengthly process of adding each new entry, we can create a generic fuction to add the new entry doc . We can add needed fields with field variable and respective tag .
addDoc Function
private static void addDoc(IndexWriter w, String title, String courseCode) throws IOException { Document doc = new Document(); doc.add(new TextField("title", title, Field.Store.YES)); // Here, we use a string field for course_code to avoid tokenizing. doc.add(new StringField("course_code", courseCode, Field.Store.YES)); w.addDocument(doc); }
Finally we completed a simple demonstration with this example.
3. Some other important Methods
void commit() :
Commit all pending changes and synchronizevoid deleteAll() :
Delete all the document of the indexAnalyzer getAnalyzer() :
Returns the current AnalyzerDirectory getDirectory() :
Returns index Directoryint numDocs() :
Return number of Document to the index including the pending one toovoid rollback() :
Close the indexWriter without committing the pending changesvoid waitForMerges() :
Waits until the left out merges are done
You can try out rest of the methods from the API Documentation itself.
4. Things to consider
- Always remember to close IndexWriter. Cause: Leaving the IndexWriter Open still implies that recently added documents are not commited or indexed into the index folder.
- Not Analyzed : is not broken down into individual tokens. It should match exactly with query string.
- You need to include both jar files of lucene-analyzers-common-x.x.x and lucene-queryparser-x.x.x along with lucene-core jar files to go with above examples.
5. Download the NetBeans project
This was an example about Lucene indexwriter.
You can download the full source code of this example here: Lucene IndexWriter Example