Solr autocomplete example

Veeramani KalyanasundaramJune 2nd, 2015Last Updated: March 19th, 2019

6 471 9 minutes read

In this example of Solr autocomplete example, we will discuss about how to implement autocomplete functionality for any UI component. We will be using jQuery autocomplete feature along with Solr indexing data to achieve the autocomplete functionality.

Our preferred environment for this example is solr-5.0.0, Eclipse Luna, JDK 8u25, and Tomcat 8 application server. Having said that, we have tested the code against JDK 1.7 and Tomcat 7 as well.

Before you begin the Solr installation make sure you have JDK installed and Java_Home is set appropriately.

1. Install Apache Solr

To begin with lets download the latest version of Apache Solr from the following location.

http://lucene.apache.org/solr/downloads.html

As of this writing, the stable version available is 5.0.0. Apache Solr has gone through various changes from 4.x.x to 5.0.0, so if you have different version of Solr you need to download the 5.x.x. version to follow this example.

Once the Solr zip file is downloaded unzip it into a folder. The extracted folder will look like the below.

The bin folder contains the scripts to start and stop the server. The example folder contains few example files. We will be using one of them to demonstrate how Solr indexes the data. The server folder contains the logs folder where all the Solr logs are written. It will be helpful to check the logs for any error during indexing. The solr folder under server holds different collection or core. The configuration and data for each of the core/ collection are stored in the respective core/ collection folder.

Apache Solr comes with an inbuilt Jetty server. But before we start the solr instance we must validate the JAVA_HOME is set on the machine.

We can start the server using the command line script. Lets go to the bin directory from the command prompt and issue the following command

solr start

This will start the Solr server under the default port 8983.

We can now open the following URL in the browser and validate that our Solr instance is running. The specifics of solr admin tool is beyond the scope of the example.

http://localhost:8983/solr/

2. Configuring Apache Solr

In this section, we will show you how to configure the core/collection for a solr instance and how to define the fields. Apache Solr ships with an option called Schemaless mode. This option allow users to construct effective schema without manually editing the schema file. But for this example we will use the Schema configuration for understanding the internals of the Solr.

2.1 Creating a Core

When the Solr server is started in Standalone mode the configuration is called core and when it is started in SolrCloud mode the configuration is called Collection. In this example we will discuss about the standalone server and core. We will park the SolrCloud discussion for later time.

First, we need to create a Core for indexing the data. The Solr create command has the following options:

-c <name> – Name of the core or collection to create (required).
-d <confdir> – The configuration directory, useful in the SolrCloud mode.
-n <configName> – The configuration name. This defaults to the same name as the core or collection.
-p <port> – Port of a local Solr instance to send the create command to; by default the script tries to detect the port by looking for running Solr instances.
-s <shards> – Number of shards to split a collection into, default is 1.
-rf <replicas> – Number of copies of each document in the collection. The default is 1.

In this example we will use the -c parameter for core name and -d parameter for the configuration directory. For all other parameters we make use of default settings.

Now navigate the solr-5.0.0\bin folder in the command window and issue the following command.

solr create -c jcg -d basic_configs

We can see the following output in the command window.

Creating new core 'jcg' using command:
http://localhost:8983/solr/admin/cores?action=CREATE&name=jcg&instanceDir=jcg

{
 "responseHeader":{
 "status":0,
 "QTime":663},
 "core":"jcg"}

Now we navigate to the following URL and we can see jcg core being populated in the core selector. You can also see the statistics of the core.

http://localhost:8983/solr

2.2 Modify the schema.xml file

We need to modify the schema.xml file under the folder server\solr\jcg\conf to include the fields. We will use one of the example file “books.csv” shipped along with Solr installation for indexing. The file is located under the folder solr-5.0.0\example\exampledocs

Now we navigate to the folder server\solr directory. You will see a folder called jcg created. The sub-folders namelyconf and data have the core’s configuration and indexed data respectively.

Now edit the schema.xml file in the \server\solr\jcg\conf folder and add the following contents after the uniqueKey element.

schema.xml

<uniqueKey>id</uniqueKey>
<!-- Fields added for books.csv load-->
<field name="cat" type="text_general" indexed="true" stored="true"/>
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="price" type="tdouble" indexed="true" stored="true"/>
<field name="inStock" type="boolean" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>

We have set the attribute indexed to true. This specifies the field is used for indexing and the record can be retrieved using the index. Setting the value to false will make the field only stored but can’t be queried with.

Also note we have another attribute called stored and set it to true. This specifies the field is stored and can be returned in the output. Setting this field to false will make the field only indexed and can’t be retrieved in output.

We have assigned the type for the fields present in the “books.csv” file here. The first field in the CSV file “id” is automatically taken care by the uniqueKey element of schema.xml file for indexing.

Since we have modified the configuration we have to stop and start the server. To do so, we need to issue the following command from bin directory through command line.

solr stop -all

The server will be stopped now. Now to start the server issue the following command from bin directory through command line.

solr start

3. Indexing the Data

Apache Solr comes with a Standalone Java program called the SimplePostTool. This program is packaged into JAR and available with the installation under the folder example\exampledocs.

Now we navigate to the example\exampledocs folder in the command prompt and type the following command. You will see a bunch of options to use the tool.

java -jar post.jar -h

As we said earlier, we will index the data present in the “books.csv” file shipped with Solr installation. We will navigate to the solr-5.0.0\example\exampledocs in the command prompt and issue the following command.

java -Dtype=text/csv -Durl=http://localhost:8983/solr/jcg/update -jar post.jar books.csv

The SystemProperties used here are:

-Dtype – the type of the data file.
-Durl – URL for the jcg core.

The file “books.csv” will now be indexed and the command prompt will display the following output.

SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/jcg/update using content-
type text/csv...
POSTing file books.csv to [base]
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/jcg/update...
Time spent: 0:00:00.647

4. Setting up the webproject

We will use the jQuery autocomplete widget to consume the data from Solr. First, we will set up the maven project for a simple web application.

In eclipse go to File -> New->Other-> Maven Project.

In the “Select project name and location” page of the wizard, make sure that “Create a simple project (skip archetype selection)” option is unchecked, hit “Next” to continue with default values.

Here choose “maven-archetype-webapp” and click on Next.

In the “Enter an artifact id” page of the wizard, you can define the name and main package of your project. Set the “Group Id” variable to "com.javacodegeeks.snippets.enterprise" and the “Artifact Id” variable to "solrautocomplete". For package enter "com.javacodegreeks.solrautocomplete" and hit “Finish” to exit the wizard and to create your project.

If you see any errors in the index.jsp , set target runtime for the project.

Now create a file called search.html in webapp folder. We are using the jQuery hosted on the cloud. We will use the jQuery AJAX to fetch the data from Solr and bind to the source of the autocomplete function.

search.html

<!DOCTYPE html>
<html>
<head>
<meta charset="ISO-8859-1">
<title>Solr auto complete</title>
<link
 href="http://code.jquery.com/ui/1.10.4/themes/ui-lightness/jquery-ui.css"
 rel="stylesheet"></link>
<script src="http://code.jquery.com/jquery-1.10.2.js"></script>
<script src="http://code.jquery.com/ui/1.10.4/jquery-ui.js"></script>
<script>
 $(function() {
 var URL_PREFIX = "http://localhost:8983/solr/jcg/select?q=name:";
 var URL_SUFFIX = "&wt=json";
 $("#searchBox").autocomplete({
 source : function(request, response) {
 var URL = URL_PREFIX + $("#searchBox").val() + URL_SUFFIX;
 $.ajax({
 url : URL,
 success : function(data) {
 var docs = JSON.stringify(data.response.docs);
 var jsonData = JSON.parse(docs);
 response($.map(jsonData, function(value, key) {
 return {
 label : value.name
 }
 }));
 },
 dataType : 'jsonp',
 jsonp : 'json.wrf'
 });
 },
 minLength : 1
 })
 });
</script>
</head>
<body>
 <div>
 <p>Type The or A</p>
 <label for="searchBox">Tags: </label> <input id="searchBox"></input>
 </div>
</body>
</html>

Since Solr runs on a different port and the request (webpage) is initiated from another port, we might end up with cross domain issue. To overcome this we have to use jsonp. The minLength parameter specify after how many characters typed the search has to begin. Over here we have specified the value as 1 which means when a single character is typed the results are bound.

Now we can create the deployment package using Run as –> Maven clean and then Run as –> Maven install. This will create a war file in the target folder. The war file produced must be placed in webapps folder of tomcat. Now we can start the server.

Open the following URL and type ‘A’ . This will bring results with books having title A ..

http://localhost:8080/solrautocomplete/search.html

Now type ‘The’ in the search box. This will return the books having word The.

The problem with the above indexing technique is we could not able to get results based on phrases. Say if we type ‘The black’ it doesn’t fetch any result. Also when we type ‘bla’ no results are bound. To overcome this issue we will use NGramFilterFactory and re-index the data.

5. Indexing using NGramFilterFactory

We will copy the field name to a new field called name_ngram. The copyField command copy one field to another at the time a document is added to the index. It’s used either to index the same field differently, or to add multiple fields to the same field for easier/faster searching.

Now modify the schema.xml file in the \server\solr\jcg\conf folder and add the following highlighted content.

schema.xml

<!--
<copyField source="title" dest="text"/>
<copyField source="body" dest="text"/>
-->
<copyField source="name" dest="name_ngram"/>

In the same file, we need to add a field called name_ngram and mark it for indexing. For it, we need to add the highlighted line.

schema.xml

<uniqueKey>id</uniqueKey>
<!-- Fields added for books.csv load-->
<field name="cat" type="text_general" indexed="true" stored="true"/>
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="price" type="tdouble" indexed="true" stored="true"/>
<field name="inStock" type="boolean" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
<field name="name_ngram" type="text_ngram" indexed="true" stored="true"/>

Take a note we have changed the type of the new field as text_ngram. We will define the type text_ngram subsequently.

Now we add the definition for the field text_ngram in the schema.xml file. We have set the minimum ngram size as 2 and maximum ngram size as 10.

schema.xml

<!-- Added for NGram field-->
<fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" maxGramSize="10"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.EdgeNGramTokenizerFactory" minGramSize="2" maxGramSize="10"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

We have combined the features of NGramTokenizerFactory and EdgeNGramTokenizerFactory to achieve the best of indexing. Since we have modified the configuration we have to stop and start the server. To do so, we need to issue the following command from bin directory through command line.

solr stop -all

The server will be stopped now. Now to start the server issue the following command from bin directory through command line.

solr start

We will re-index the data present in the books.csv file. We will navigate to the solr-5.0.0\example\exampledocs in the command prompt and issue the following command.

java -Dtype=text/csv -Durl=http://localhost:8983/solr/jcg/update -jar post.jar books.csv

The file books.csv will now be re-indexed and the command prompt will display the following output.

SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/jcg/update using content-type text/csv...
POSTing file books.csv to [base]
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/jcg/update...
Time spent: 0:00:02.325

6. Modify search.html

Now we will modify the search.html to include another search box to test the NGram indexing. We will create search box with id ngrambox and write another javascript function for the new search box.

search.html

<!DOCTYPE html>
<html>
<head>
<meta charset="ISO-8859-1">
<title>Solr auto complete</title>
<link
 href="http://code.jquery.com/ui/1.10.4/themes/ui-lightness/jquery-ui.css"
 rel="stylesheet"></link>
<script src="http://code.jquery.com/jquery-1.10.2.js"></script>
<script src="http://code.jquery.com/ui/1.10.4/jquery-ui.js"></script>
<script>
 $(function() {
 var URL_PREFIX = "http://localhost:8983/solr/jcg/select?q=name:";
 var URL_SUFFIX = "&wt=json";
 $("#searchBox").autocomplete({
 source : function(request, response) {
 var URL = URL_PREFIX + $("#searchBox").val() + URL_SUFFIX;
 $.ajax({
 url : URL,
 success : function(data) {
 var docs = JSON.stringify(data.response.docs);
 var jsonData = JSON.parse(docs);
 response($.map(jsonData, function(value, key) {
 return {
 label : value.name
 }
 }));
 },
 dataType : 'jsonp',
 jsonp : 'json.wrf'
 });
 },
 minLength : 1
 })
 });
 $(function() {
 var URL_PREFIX = "http://localhost:8983/solr/jcg/select?q=name:";
 var URL_MIDDLE = "OR name_ngram:";
 var URL_SUFFIX = "&wt=json";
 $("#ngramBox").autocomplete(
 {
 source : function(request, response) {
 var searchString = "\"" + $("#ngramBox").val() + "\"";
 var URL = URL_PREFIX + searchString + URL_MIDDLE
 + searchString + URL_SUFFIX;
 $.ajax({
 url : URL,
 success : function(data) {
 var docs = JSON.stringify(data.response.docs);
 var jsonData = JSON.parse(docs);
 response($.map(jsonData, function(value, key) {
 return {
 label : value.name
 }
 }));
 },
 dataType : 'jsonp',
 jsonp : 'json.wrf'
 });
 },
 minLength : 1
 })
 });
</script>
</head>
<body>
 <div>
 <p>Type 'A' or 'The'</p>
 <label for="searchBox">Tags: </label> <input id="searchBox"></input>
 </div>
 <div>
 <p>Type 'Th' or 'Bla' or 'The Black'</p>
 <label for="ngramBox">Tags: </label> <input id="ngramBox"></input>
 </div>
</body>
</html>

Now again package using maven and copy the war to the apache tomcat webapps folder. Open the following URL in the browser and type ‘Bla’.

http://localhost:8080/solrautocomplete/search.html