Apache Solr

Solr Project using Solr Core as a Search Engine

In this article, we are going to introduce a Solr Project using Solr Core as a Search Engine.

1. Introduction

Apache Solr is an open-source search platform based on Apache Lucene. It’s written in Java. A Solr Core refers to a single index and associated transaction log and configuration files. We can perform operations like indexing, analyzing, searching at a Solr Core. It supports Windows, Linux, and UNIX operating system. In this example, I will demonstrate the following items with a Windows 10 OS machine:

  • Download & Install Apache Solr
  • Start a Solr Server as a Single Instance
  • Common Solr commands
  • Solr Admin Console
  • Restful Search Query

2. Pre-requisites

The Apache Solr requires JRE 7+. Please click here to install.

3. Install Solr on Windows

3.1 Download

In this step, I will download from Apache Solr Download site. I downloaded solr-8.6.3.tgz.

3.2 Install

In this step, I will unpack solr-8.6.3.tgz to C:\MaryZheng\DevTools\solr-8.6.3.tar, and then unzip it to C:\MaryZheng\DevTools\solr.

3.3 Solr Folder Structure

Navigate to the Solr home directory – C:\MaryZheng\DevTools\solr\solr-8.6.3\. Capture the screenshot.

Solr Core Search Engine - Solr Home
Figure 1 Solr Home Folder

I will explain the following folders:

  • bin directory contains the commands to start, stop, etc commands. There is no need to change anything else. Please note when a Solr server is started, a solr-{port}.port file is created. It will be removed when the server is stopped.
  • contrib directory contains all the components.
  • dist directory contains all the libraries.
  • example directory contains the examples.
  • docs directory provides the documentations.
  • server directory includes server details. It will create /logs and /tmp directories when starting a server.

Here is what /server/solr directory looks like right after the installation.

Solr Core Search Engine - Default server/solr
Figure 2 Default server/solr Folder

Here is what /server/solr directory looks like after created three cores : films, techproducts, and money.

Solr Core Search Engine -  server/solr Folder After Three Cores
Figure 3 server/solr Folder After Three Cores Created

I use a tree command to show all the folders under server\solr.

C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr >tree
Folder PATH listing for volume OSDisk
Volume serial number is 34CD-EFB3
C:.
├───configsets
│   ├───sample_techproducts_configs
│   │   └───conf
│   │       ├───clustering
│   │       │   └───carrot2
│   │       ├───lang
│   │       ├───velocity
│   │       └───xslt
│   └───_default
│       └───conf
│           └───lang
├───filestore
├───films
│   ├───conf
│   │   └───lang
│   └───data
│       ├───index
│       ├───snapshot_metadata
│       └───tlog
├───money
│   ├───conf
│   │   └───lang
│   └───data
│       ├───index
│       ├───snapshot_metadata
│       └───tlog
├───techproducts
│   ├───conf
│   │   ├───clustering
│   │   │   └───carrot2
│   │   ├───lang
│   │   ├───velocity
│   │   └───xslt
│   └───data
│       ├───index
│       ├───snapshot_metadata
│       └───tlog
└───userfiles

Under each core, it has core.properties, /config folder to include solrconfig.xml and either managed-schema.xml or schema.xml, and data folder to store the index. The following are three important configuration files for the films core:

C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr\films\conf\solrconfig.xml
C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr\films\conf\managed-schema
C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr\films\core.properties

4. Common Commands

4.1 Help Command

Solar command uses -help option to show the syntax. Here is an example from start -help.

start -help

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >solr.cmd start -help

Usage: solr start [-f] [-c] [-h hostname] [-p port] [-d directory] [-z zkHost] [-m memory] [-e example] [-s solr.solr.home] [-t solr.data.home] [-a "additional-options"] [-V]

  -f            Start Solr in foreground; default starts Solr in the background
                  and sends stdout / stderr to solr-PORT-console.log

  -c or -cloud  Start Solr in SolrCloud mode; if -z not supplied and ZK_HOST not defined in
                  solr.in.cmd, an embedded ZooKeeper instance is started on Solr port+1000,
                  such as 9983 if Solr is bound to 8983

  -h host       Specify the hostname for this Solr instance

  -p port       Specify the port to start the Solr HTTP listener on; default is 8983
"                  The specified port (SOLR_PORT) will also be used to determine the stop port"
"                  STOP_PORT=(\$SOLR_PORT-1000) and JMX RMI listen port RMI_PORT=(\$SOLR_PORT+10000). "
"                  For instance, if you set -p 8985, then the STOP_PORT=7985 and RMI_PORT=18985"

  -d dir        Specify the Solr server directory; defaults to server

  -z zkHost     Zookeeper connection string; only used when running in SolrCloud mode using -c
                  If neither ZK_HOST is defined in solr.in.cmd nor the -z parameter is specified,
                  an embedded ZooKeeper instance will be launched.

  -m memory     Sets the min (-Xms) and max (-Xmx) heap size for the JVM, such as: -m 4g
                  results in: -Xms4g -Xmx4g; by default, this script sets the heap size to 512m

  -s dir        Sets the solr.solr.home system property; Solr will create core directories under
                  this directory. This allows you to run multiple Solr instances on the same host
                  while reusing the same server directory set using the -d parameter. If set, the
                  specified directory should contain a solr.xml file, unless solr.xml exists in Zookeeper.
                  This parameter is ignored when running examples (-e), as the solr.solr.home depends
                  on which example is run. The default value is server/solr. If passed a relative dir
                  validation with the current dir will be done before trying the default server/ <dir >

  -t dir        Sets the solr.data.home system property, where Solr will store index data in  <instance_dir >/data subdirectories.
                  If not set, Solr uses solr.solr.home for both config and data.

  -e example    Name of the example to run; available examples:
      cloud:          SolrCloud example
      techproducts:   Comprehensive example illustrating many of Solr's core capabilities
      dih:            Data Import Handler
      schemaless:     Schema-less example

  -a opts       Additional parameters to pass to the JVM when starting Solr, such as to setup
                Java debug options. For example, to enable a Java debugger to attach to the Solr JVM
                you could pass: -a "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=18983"
                In most cases, you should wrap the additional parameters in double quotes.

  -j opts       Additional parameters to pass to Jetty when starting Solr.
                For example, to add configuration folder that jetty should read
                you could pass: -j "--include-jetty-dir=/etc/jetty/custom/server/"
                In most cases, you should wrap the additional parameters in double quotes.

  -noprompt     Don't prompt for input; accept all defaults when running examples that accept user input

  -v and -q     Verbose (-v) or quiet (-q) logging. Sets default log level to DEBUG or WARN instead of INFO

  -V/-verbose   Verbose messages from this script


C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >

4.2 Start Server

In this step, I will demonstrate how to start a Solr server instance. I can start with the default settings.

start

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>solr.cmd start
Java HotSpot(TM) 64-Bit Server VM warning: JVM cannot use large page memory because it does not have enough privilege to lock pages in memory.
Waiting up to 30 to see Solr running on port 8983
Started Solr server on port 8983. Happy searching!

The warning message is about JVM which can be addressed by following the steps in this article. The default port is 8983. I can start with a specific port with -p option.

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >solr.cmd start -p 8988
Java HotSpot(TM) 64-Bit Server VM warning: JVM cannot use large page memory because it does not have enough privilege to lock pages in memory.
Waiting up to 30 to see Solr running on port 8988
Started Solr server on port 8988. Happy searching!

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >

4.3 Check Status

In this step, I will use the status command to check the server status.

status

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>solr.cmd status

Found Solr process 102288 running on port 8983
{
  "solr_home":"C:\\MaryZheng\\DevTools\\solr\\solr-8.6.3\\server\\solr",
  "version":"8.6.3 e001c2221812a0ba9e9378855040ce72f93eced4 - jasongerlowski - 2020-10-03 18:12:03",
  "startTime":"2020-10-25T14:19:54.900Z",
  "uptime":"0 days, 0 hours, 1 minutes, 8 seconds",
  "memory":"201.9 MB (%39.4) of 512 MB"}

4.4 Stop Server

In this step, I will use stop command to stop Solr instances. You can use the -p option to stop the instance at a specific port.

stop -p 8988

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >solr.cmd stop -p 8988
Stopping Solr process 77940 running on port 8988

Waiting for 0 seconds, press a key to continue ...

stop -all

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>solr.cmd stop -all
Stopping Solr process 102288 running on port 8983

Waiting for 0 seconds, press a key to continue ...

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>

4.5 Start with Example

Solr provides four examples. In this step, I will start the Solr with the techproducts example.

start -e techproducts

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>solr.cmd start -e techproducts
Creating Solr home directory C:\MaryZheng\DevTools\solr\solr-8.6.3\example\techproducts\solr

Starting up Solr on port 8983 using command:
"C:\MaryZheng\DevTools\solr\solr-8.6.3\bin\solr.cmd" start -p 8983 -s "C:\MaryZheng\DevTools\solr\solr-8.6.3\example\techproducts\solr"

Java HotSpot(TM) 64-Bit Server VM warning: JVM cannot use large page memory because it does not have enough privilege to lock pages in memory.
Waiting up to 30 to see Solr running on port 8983
Started Solr server on port 8983. Happy searching!
Created new core 'techproducts'

Indexing tech product example docs from C:\MaryZheng\DevTools\solr\solr-8.6.3\example\exampledocs
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/techproducts/update using content-type application/xml...
POSTing file gb18030-example.xml to [base]
POSTing file hd.xml to [base]
POSTing file ipod_other.xml to [base]
POSTing file ipod_video.xml to [base]
POSTing file manufacturers.xml to [base]
POSTing file mem.xml to [base]
POSTing file money.xml to [base]
POSTing file monitor.xml to [base]
POSTing file monitor2.xml to [base]
POSTing file mp500.xml to [base]
POSTing file sd500.xml to [base]
POSTing file solr.xml to [base]
POSTing file utf8-example.xml to [base]
POSTing file vidcard.xml to [base]
14 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/techproducts/update...
Time spent: 0:00:02.624

Solr techproducts example launched successfully. Direct your Web browser to http://localhost:8983/solr to visit the Solr Admin UI

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>

As you seen from the output, it created the core and loaded data from the example files.

4.6 Create Solr Core

We can use Solr Admin Console to create a core. However, in this step, I will create a Solr core via a command.

create_core -c films

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>solr.cmd create_core -c films
WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use.
         To turn off: bin\solr config -c films -p 8983 -action set-user-property -property update.autoCreateFields -value false

Created new core 'films'

Create another core: money

create_core -c money

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >solr.cmd create_core -c money
WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use.
         To turn off: bin\solr config -c money -p 8983 -action set-user-property -property update.autoCreateFields -value false

Created new core 'money'

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >

Delete the core: money

delete -c money

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >solr.cmd delete -c money

Deleting core 'money' using command:
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=money&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true


C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >

4.7 Load Data

Solr provides several example documents. In this step, I will load the data into films core from the provided sample films.csv file.

First, go to C:\MaryZheng\DevTools\solr\solr-8.6.3\example\exampledocs> and enter the following Java command:

java -Dc=films -Dtype=text/csv -jar post.jar ..\films\films.csv

C:\MaryZheng\DevTools\solr\solr-8.6.3\example\exampledocs>java -Dc=films -Dtype=text/csv -jar post.jar ..\films\films.csv
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/films/update using content-type text/csv...
POSTing file films.csv to [base]
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/films/update
SimplePostTool: WARNING: Response: {
  "responseHeader":{
    "status":400,
    "QTime":1006},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","java.lang.NumberFormatException"],
    "msg":"ERROR: [doc=/en/quien_es_el_senor_lopez] Error adding field 'name'='¿Quién es el señor López?' msg=For input string: \"¿Quién es el señor López?\"",
    "code":400}}
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/films/update
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/films/update...
Time spent: 0:00:02.154
  • line 3, 18 – Request handler to update the core data
  • line 8 – 400 bad request response

We got a 400 bad request error. The error is caused by the data in the films.csv file. Open the file, the name column has 0.45 in the first row but the 5th row has a text value.

Figure 4 Films CV File

By default, Solr post.jar defines the data type based on the first row data value. so it defines the name field as a pdoubles type.

Figure 5 The Name Field

We can view the C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr\films\conf\manage-schema.xml. I copy the name field here.

<field name="name" type="pdoubles" >

We will use Schema browser to delete the name filed and re-add it back with the text_general type. After that, view the C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr\films\conf\manage-schema.xml. You will see the type value is updated.

<field name="name" type="text_general" uninvertible="true" indexed="true" stored="true" > </field >

After the schema is updated, then you can re-execute the command. This time, you should see the output as the following:

C:\MaryZheng\DevTools\solr\solr-8.6.3\example\exampledocs>java -Dc=films -Dtype=text/csv -jar post.jar ..\films\films.csv
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/films/update using content-type text/csv...
POSTing file films.csv to [base]
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/films/update...
Time spent: 0:00:02.250

C:\MaryZheng\DevTools\solr\solr-8.6.3\example\exampledocs>

At this moment, there is 1100 records in the films core. We will use these data to perform query later.

5. Solr Admin Console

Apache Solr provides a great admin console. You can access it from http://localhost:8983/.

5.1 Core Admin

In this step, I will open a web browser and navigate to http://localhost:8983/. You should see a Solar Admin console as the following screenshot.

Figure 6 Solr Admin

As you seen here, you can view the server log at the Logging section.

5.2 Analysis

An analyzer examines the text of fields and generates a token stream. You can click Analysis under the selected core.

I typed “This is a simple math question, do you agree?” at the Field Value (Index). I entered “Math is fun.” at the Field Value (Query).

Figure 7 Analysis

As you seen at Figure 7, the FieldType is text_general. It outputs both Analyse and Query results. It highlights the matching tokens: is and math.

Solr Admin console provides a very easy way to query the data from a Solr core. In this step, I will search the films core to find out any documents whose name filed contains David.

Figure 8 Solr Query
  1. Select the films from the Core Selector drop down box
  2. Click “Query”
  3. Note that the Request Handler is /select
  4. Enter “name:David” as Solr Query
  5. Enter “id desc” under sort to sort the results
  6. Enter “name, id, directed_by” under ft to only list these fields in the output results
  7. Select “json” from wt as the output format

5.4 Browse Example from techproducts

Solr techproducts example also provides a browse link : http://localhost:8983/solr/techproducts/browse/.

Figure 9 Browse techproducts

6. Restful Queries

Solr provides RestFul APIs for queries executed at the console. You can see the exact query at the top of the screen. Please pay attention to the query outline at the Figure 8.

6.1 Normal Query

In this step, I will use curl command to execute the exact same query in Figure 8: search the films core to find out any documents whose name filed contains David.

curl command

C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >curl "http://localhost:8983/solr/films/select?fl=name%2Cid%2Cdirected_by&q=name%3ADavid&sort=id%20desc"
{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"name:David",
      "fl":"name,id,directed_by",
      "sort":"id desc"}},
  "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[
      {
        "name":["David & Layla"],
        "directed_by":["Jay Jonroy"],
        "id":"/en/david_layla"},
      {
        "name":["David Gilmour in Concert"],
        "directed_by":["David Mallet"],
        "id":"/en/david_gilmour_in_concert"}]
  }}

Note: line 1 url is explained as following:

  • http://{hostname:port}/solr – It’s the Solr server host name and port. It varies for each instance.
  • /solr – it’s the constant value.
  • /films – it’s the core name, used to search data from.
  • /select – it’s the request handler for querying data.
  • fl – it’s the Solr query parameter to list of the fields at the result sets.
  • q – it’s the Solr query parameter to specify the query conditions.
  • sort – it defines the result set’s sorting condition.

Note: line 10 – response data is explained as following:

  • numFound – total number of record found.
  • start – the start position.
  • docs – the array of the documents

6.2 Query with Facet

Apache Solr provides faceting capacity at the result sets. You can access it via http://localhost:8983/solr/techproducts/browse. Here is an example from the techproducts core.

curl command

C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr >curl "http://localhost:8983/solr/techproducts/select?facet.field=cat&facet=on&q=price%3A%5B100%20TO%20200%5D"
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "q":"price:[100 TO 200]",
      "facet.field":"cat",
      "facet":"on"}},
  "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[
      {
        "id":"TWINX2048-3200PRO",
        "name":"CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail",
        "manu":"Corsair Microsystems Inc.",
        "manu_id_s":"corsair",
        "cat":["electronics",
          "memory"],
        "features":["CAS latency 2,  2-3-3-6 timing, 2.75v, unbuffered, heat-spreader"],
        "price":185.0,
        "price_c":"185.00,USD",
        "popularity":5,
        "inStock":true,
        "store":"37.7752,-122.4232",
        "manufacturedate_dt":"2006-02-13T15:26:37Z",
        "payloads":"electronics|6.0 memory|3.0",
        "_version_":1683289868499681280,
        "price_c____l_ns":18500},
      {
        "id":"0579B002",
        "name":"Canon PIXMA MP500 All-In-One Photo Printer",
        "manu":"Canon Inc.",
        "manu_id_s":"canon",
        "cat":["electronics",
          "multifunction printer",
          "printer",
          "scanner",
          "copier"],
        "features":["Multifunction ink-jet color photo printer",
          "Flatbed scanner, optical scan resolution of 1,200 x 2,400 dpi",
          "2.5\" color LCD preview screen",
          "Duplex Copying",
          "Printing speed up to 29ppm black, 19ppm color",
          "Hi-Speed USB",
          "memory card: CompactFlash, Micro Drive, SmartMedia, Memory Stick, Memory Stick Pro, SD Card, and MultiMediaCard"],
        "weight":352.0,
        "price":179.99,
        "price_c":"179.99,USD",
        "popularity":6,
        "inStock":true,
        "store":"45.19214,-93.89941",
        "_version_":1683289868634947584,
        "price_c____l_ns":17999}]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "cat":[
        "electronics",2,
        "copier",1,
        "memory",1,
        "multifunction printer",1,
        "printer",1,
        "scanner",1,
        "camera",0,
        "connector",0,
        "currency",0,
        "electronics and computer1",0,
        "electronics and stuff2",0,
        "graphics card",0,
        "hard drive",0,
        "music",0,
        "search",0,
        "software",0]},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_heatmaps":{}}}

  • line 1 – includes facet=on and facet.field=cat to group the results sets by the cat field
  • line 54 – facet results

7. Summary

That was an introduction about a Solr Project using Solr Core as a Search Engine. Apache Solr provides advanced full-text search capability.

In this example, I demonstrated the basic operations and how to use the admin console to query and analysis. I also showed few query examples via Restful APIs which can be consumed by any Rest client.

If you want to read more about Apache Solr, take a look here.

8. Download the Source Code

Download
You can download the commands used in this example here: Solr Project using Solr Core as a Search Engine

Mary Zheng

Mary has graduated from Mechanical Engineering department at ShangHai JiaoTong University. She also holds a Master degree in Computer Science from Webster University. During her studies she has been involved with a large number of projects ranging from programming and software engineering. She works as a senior Software Engineer in the telecommunications sector where she acts as a leader and works with others to design, implement, and monitor the software solution.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button