lucene

Apache Lucene Hello World Example

 1. Introduction

In this example, I would like to show you how to get started with Apache Lucene and write a simple Hello World program. Apache Lucene is an open source tool that provides full text searching and indexing features. Apache Lucene site has excellent details of the features and examples. However, the examples on the site are very detailed. Here, I present a simple example to get started with this cool technology.

This example uses the below technologies, frameworks and IDE:

a. JDK 1.8
b. Apache Lucene 6.5.1
c. Eclipse Neon (You can use any IDE of your choice, or run it via command line)

2. Getting started

Let’s get going by first getting the relevant jars. At the time of writing this article, the latest apache lucene jar version is 6.5.1 that can be downloaded from the apache site. Extract the downloaded file and get the main jars (lucene-core-6.5.1.jar, lucene-queryparser-6.5.1.jar, lucene-analyzers-common-6.5.1.jar).

Tip
You may use any IDE of your choice or run code via command line.

Next, create a new eclipse project (I named it JCG).

Apache Lucene Hello World - New Eclipse Project
New Eclipse Project

Choose a name for the project and save.

Apache Lucene Hello World - Save Project
Save Project

Add the downloaded jars in the project build path. Even though for this example we only need lucene-core and lucene-queryparser jars, it is recommended to add all three jars for lucene projects.

Apache Lucene Hello World - Adding Lucene jars in classpath
Adding Lucene jars in classpath

3. What the code needs to accomplish

We start by building a simple index using IndexWriter class that builds and maintains an index, create a couple of document objects and add them to the IndexWriter instance. For the purpose of illustrating the functionality, we are using RAMDirectory to create the IndexWriter. Please note that RAMDirectory is a memory-resident Directory implementation that may not work very well with big indexes. However, it works well to illustrate the Directory functionality needed for our program.

Once the documents have been added and indexed, we will use IndexReader to access the index and IndexSearcher to search the index by using a query that searches on the index created. QueryParser instance is created with the content to be searched for. Query instance fetched to get the TopDocs value that in turn gives the number of hits.

Java code listed below performs the search and lists the number of hits. Search on a value in the index should return number of hits in the index, while search for any text not indexed should return 0.

3.1 Java Code

Let’s look at the code now.

LuceneHelloWorld.java

package com.javacodegeeks.lucene;

import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

public class LuceneHelloWorld {

 public static void main(String[] args) throws IOException, ParseException {
 //New index
 StandardAnalyzer standardAnalyzer = new StandardAnalyzer();
 Directory directory = new RAMDirectory();
 IndexWriterConfig config = new IndexWriterConfig(standardAnalyzer); 
 //Create a writer
 IndexWriter writer = new IndexWriter(directory, config);
 Document document = new Document ();
 //In a real world example, content would be the actual content that needs to be indexed.
 //Setting content to Hello World as an example.
 document.add(new TextField("content", "Hello World", Field.Store.YES));
 writer.addDocument(document);
 document.add(new TextField("content", "Hello people", Field.Store.YES));
 writer.addDocument(document); 
 writer.close();
 
 //Now let's try to search for Hello
 IndexReader reader = DirectoryReader.open(directory);
 IndexSearcher searcher = new IndexSearcher (reader);
 QueryParser parser = new QueryParser ("content", standardAnalyzer);
 Query query = parser.parse("Hello");
 TopDocs results = searcher.search(query, 5);
 System.out.println("Hits for Hello -->" + results.totalHits);

 //case insensitive search
 query = parser.parse("hello");
 results = searcher.search(query, 5);
 System.out.println("Hits for hello -->" + results.totalHits);

 //search for a value not indexed
 query = parser.parse("Hi there");
 results = searcher.search(query, 5);
 System.out.println("Hits for Hi there -->" + results.totalHits);
 }
}

3.2 Code output

The above code performs a query on the index using “Hello” and “hello” as search parameters – the search returns the total hits as expected. Searching on a value not present in the index e.g. "Hi there" returns 0 as the total hits as expected.

Hits for Hello -->2
Hits for hello ->2
Hits for Hi there -->0

3.3 Java code reading file contents and index output on a folder

We will now modify the code listed in section 3.1 to read from a file and index to a folder. Let’s look at the code:

LuceneHelloWorldReadFromFile.java

package com.javacodegeeks.lucene;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class LuceneHelloWorldReadFromFile {

	public static void main(String[] args) throws IOException, ParseException {
		// New index
		StandardAnalyzer standardAnalyzer = new StandardAnalyzer();
		String inputFilePath = "C:\\priya\\workspace\\JCG\\src\\com\\javacodegeeks\\lucene\\input.txt";
		String outputDir = "C:\\priya\\workspace\\JCG\\src\\com\\javacodegeeks\\lucene\\output";
		File file = new File(inputFilePath);

		Directory directory = FSDirectory.open(Paths.get(outputDir));
		IndexWriterConfig config = new IndexWriterConfig(standardAnalyzer);
		config.setOpenMode(OpenMode.CREATE);
		// Create a writer
		IndexWriter writer = new IndexWriter(directory, config);

		Document document = new Document();
		try (BufferedReader br = new BufferedReader(new FileReader(inputFilePath))) {

			document.add(new TextField("content", br));
			writer.addDocument(document);
			writer.close();

		} catch (IOException e) {
			e.printStackTrace();
		}

		// Now let's try to search for Hello
		IndexReader reader = DirectoryReader.open(directory);
		IndexSearcher searcher = new IndexSearcher(reader);
		QueryParser parser = new QueryParser("content", standardAnalyzer);
		Query query = parser.parse("Hello");
		TopDocs results = searcher.search(query, 5);
		System.out.println("Hits for Hello -->" + results.totalHits);

		// case insensitive search
		query = parser.parse("hello");
		results = searcher.search(query, 5);
		System.out.println("Hits for hello -->" + results.totalHits);

		// search for a value not indexed
		query = parser.parse("Hi there");
		results = searcher.search(query, 5);
		System.out.println("Hits for Hi there -->" + results.totalHits);
	}
}

3.4 Code Output

In the code presented in section 3.3, the change we have made is to read contents to be indexed from a file input.txt and index to the outputDir directory:

Directory directory = FSDirectory.open(Paths.get(outputDir));
IndexWriterConfig config = new IndexWriterConfig(standardAnalyzer);
config.setOpenMode(OpenMode.CREATE);

		Document document = new Document();
		try (BufferedReader br = new BufferedReader(new FileReader(inputFilePath))) {

			document.add(new TextField("content", br));
			writer.addDocument(document);
			writer.close();

		} catch (IOException e) {
			e.printStackTrace();
		}

Also, the IndexWriter in this code creates index in the directory presented in attribute outputDir. You can view the indexing output by viewing the output folder. See a sample output below:

Apache Lucene Hello World - Indexed files
Indexed files

Sample input.txt and corresponding output of the java code in section 3.3 listed below:

input.txt-
Hello world

Output –

Hits for Hello -->1
Hits for hello -->1
Hits for Hi there -->0

4. Apache Lucene Hello World – Summary

In this example, we learnt how to get started with Lucene by getting the relevant jars, including jars in eclipse and running a Lucene Hello World programs – using two different approaches to indexing.

Hope you enjoyed this tutorial to get started with Lucene. This tutorial would serve as a starting point to get started with this rich open source technology. Enjoy and happy programming!

5. References

Some useful links are listed below for your reference:

6. Download the Eclipse Project

This was an Apache lucene Hello World example with Eclipse.

Download
You can download the full source code of this example here: lucene hello world

Sripriya Venkatesan

Sripriya is a Computer Science engineering graduate, she topped her graduation class and was a gold medalist. She has about 15 yrs of work experience, currently working as a technical architect/ technical manager for large scale enterprise applications, mainly around Java and database technologies; spanning different clients, geographies and domains. She has traveled to multiple countries and strives for work life balance. She is passionate about programming, design, architecture and enjoys working on new technologies.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Simon
7 years ago

great work | thank you so much!

Shashi
Shashi
4 years ago

Thanks for the example.

Back to top button