Solr Project using Solr Core as a Search Engine
In this article, we are going to introduce a Solr Project using Solr Core as a Search Engine.
1. Introduction
Apache Solr is an open-source search platform based on Apache Lucene. It’s written in Java. A Solr Core refers to a single index and associated transaction log and configuration files. We can perform operations like indexing, analyzing, searching at a Solr Core. It supports Windows, Linux, and UNIX operating system. In this example, I will demonstrate the following items with a Windows 10 OS machine:
- Download & Install Apache Solr
- Start a Solr Server as a Single Instance
- Common Solr commands
- Solr Admin Console
- Restful Search Query
2. Pre-requisites
The Apache Solr requires JRE 7+. Please click here to install.
3. Install Solr on Windows
3.1 Download
In this step, I will download from Apache Solr Download site. I downloaded solr-8.6.3.tgz.
3.2 Install
In this step, I will unpack solr-8.6.3.tgz to C:\MaryZheng\DevTools\solr-8.6.3.tar, and then unzip it to C:\MaryZheng\DevTools\solr.
3.3 Solr Folder Structure
Navigate to the Solr home directory – C:\MaryZheng\DevTools\solr\solr-8.6.3\. Capture the screenshot.
I will explain the following folders:
- bin directory contains the commands to start, stop, etc commands. There is no need to change anything else. Please note when a Solr server is started, a solr-{port}.port file is created. It will be removed when the server is stopped.
- contrib directory contains all the components.
- dist directory contains all the libraries.
- example directory contains the examples.
- docs directory provides the documentations.
- server directory includes server details. It will create /logs and /tmp directories when starting a server.
Here is what /server/solr directory looks like right after the installation.
Here is what /server/solr directory looks like after created three cores : films, techproducts, and money.
I use a tree command to show all the folders under server\solr.
C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr >tree Folder PATH listing for volume OSDisk Volume serial number is 34CD-EFB3 C:. ├───configsets │ ├───sample_techproducts_configs │ │ └───conf │ │ ├───clustering │ │ │ └───carrot2 │ │ ├───lang │ │ ├───velocity │ │ └───xslt │ └───_default │ └───conf │ └───lang ├───filestore ├───films │ ├───conf │ │ └───lang │ └───data │ ├───index │ ├───snapshot_metadata │ └───tlog ├───money │ ├───conf │ │ └───lang │ └───data │ ├───index │ ├───snapshot_metadata │ └───tlog ├───techproducts │ ├───conf │ │ ├───clustering │ │ │ └───carrot2 │ │ ├───lang │ │ ├───velocity │ │ └───xslt │ └───data │ ├───index │ ├───snapshot_metadata │ └───tlog └───userfiles
Under each core, it has core.properties, /config folder to include solrconfig.xml and either managed-schema.xml or schema.xml, and data folder to store the index. The following are three important configuration files for the films core:
C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr\films\conf\solrconfig.xml C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr\films\conf\managed-schema C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr\films\core.properties
4. Common Commands
4.1 Help Command
Solar command uses -help option to show the syntax. Here is an example from start -help.
start -help
C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >solr.cmd start -help Usage: solr start [-f] [-c] [-h hostname] [-p port] [-d directory] [-z zkHost] [-m memory] [-e example] [-s solr.solr.home] [-t solr.data.home] [-a "additional-options"] [-V] -f Start Solr in foreground; default starts Solr in the background and sends stdout / stderr to solr-PORT-console.log -c or -cloud Start Solr in SolrCloud mode; if -z not supplied and ZK_HOST not defined in solr.in.cmd, an embedded ZooKeeper instance is started on Solr port+1000, such as 9983 if Solr is bound to 8983 -h host Specify the hostname for this Solr instance -p port Specify the port to start the Solr HTTP listener on; default is 8983 " The specified port (SOLR_PORT) will also be used to determine the stop port" " STOP_PORT=(\$SOLR_PORT-1000) and JMX RMI listen port RMI_PORT=(\$SOLR_PORT+10000). " " For instance, if you set -p 8985, then the STOP_PORT=7985 and RMI_PORT=18985" -d dir Specify the Solr server directory; defaults to server -z zkHost Zookeeper connection string; only used when running in SolrCloud mode using -c If neither ZK_HOST is defined in solr.in.cmd nor the -z parameter is specified, an embedded ZooKeeper instance will be launched. -m memory Sets the min (-Xms) and max (-Xmx) heap size for the JVM, such as: -m 4g results in: -Xms4g -Xmx4g; by default, this script sets the heap size to 512m -s dir Sets the solr.solr.home system property; Solr will create core directories under this directory. This allows you to run multiple Solr instances on the same host while reusing the same server directory set using the -d parameter. If set, the specified directory should contain a solr.xml file, unless solr.xml exists in Zookeeper. This parameter is ignored when running examples (-e), as the solr.solr.home depends on which example is run. The default value is server/solr. If passed a relative dir validation with the current dir will be done before trying the default server/ <dir > -t dir Sets the solr.data.home system property, where Solr will store index data in <instance_dir >/data subdirectories. If not set, Solr uses solr.solr.home for both config and data. -e example Name of the example to run; available examples: cloud: SolrCloud example techproducts: Comprehensive example illustrating many of Solr's core capabilities dih: Data Import Handler schemaless: Schema-less example -a opts Additional parameters to pass to the JVM when starting Solr, such as to setup Java debug options. For example, to enable a Java debugger to attach to the Solr JVM you could pass: -a "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=18983" In most cases, you should wrap the additional parameters in double quotes. -j opts Additional parameters to pass to Jetty when starting Solr. For example, to add configuration folder that jetty should read you could pass: -j "--include-jetty-dir=/etc/jetty/custom/server/" In most cases, you should wrap the additional parameters in double quotes. -noprompt Don't prompt for input; accept all defaults when running examples that accept user input -v and -q Verbose (-v) or quiet (-q) logging. Sets default log level to DEBUG or WARN instead of INFO -V/-verbose Verbose messages from this script C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >
4.2 Start Server
In this step, I will demonstrate how to start a Solr server instance. I can start with the default settings.
start
C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>solr.cmd start Java HotSpot(TM) 64-Bit Server VM warning: JVM cannot use large page memory because it does not have enough privilege to lock pages in memory. Waiting up to 30 to see Solr running on port 8983 Started Solr server on port 8983. Happy searching!
The warning message is about JVM which can be addressed by following the steps in this article. The default port is 8983. I can start with a specific port with -p option.
C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >solr.cmd start -p 8988 Java HotSpot(TM) 64-Bit Server VM warning: JVM cannot use large page memory because it does not have enough privilege to lock pages in memory. Waiting up to 30 to see Solr running on port 8988 Started Solr server on port 8988. Happy searching! C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >
4.3 Check Status
In this step, I will use the status command to check the server status.
status
C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>solr.cmd status Found Solr process 102288 running on port 8983 { "solr_home":"C:\\MaryZheng\\DevTools\\solr\\solr-8.6.3\\server\\solr", "version":"8.6.3 e001c2221812a0ba9e9378855040ce72f93eced4 - jasongerlowski - 2020-10-03 18:12:03", "startTime":"2020-10-25T14:19:54.900Z", "uptime":"0 days, 0 hours, 1 minutes, 8 seconds", "memory":"201.9 MB (%39.4) of 512 MB"}
4.4 Stop Server
In this step, I will use stop command to stop Solr instances. You can use the -p option to stop the instance at a specific port.
stop -p 8988
C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >solr.cmd stop -p 8988 Stopping Solr process 77940 running on port 8988 Waiting for 0 seconds, press a key to continue ...
stop -all
C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>solr.cmd stop -all Stopping Solr process 102288 running on port 8983 Waiting for 0 seconds, press a key to continue ... C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>
4.5 Start with Example
Solr provides four examples. In this step, I will start the Solr with the techproducts example.
start -e techproducts
C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>solr.cmd start -e techproducts Creating Solr home directory C:\MaryZheng\DevTools\solr\solr-8.6.3\example\techproducts\solr Starting up Solr on port 8983 using command: "C:\MaryZheng\DevTools\solr\solr-8.6.3\bin\solr.cmd" start -p 8983 -s "C:\MaryZheng\DevTools\solr\solr-8.6.3\example\techproducts\solr" Java HotSpot(TM) 64-Bit Server VM warning: JVM cannot use large page memory because it does not have enough privilege to lock pages in memory. Waiting up to 30 to see Solr running on port 8983 Started Solr server on port 8983. Happy searching! Created new core 'techproducts' Indexing tech product example docs from C:\MaryZheng\DevTools\solr\solr-8.6.3\example\exampledocs SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/techproducts/update using content-type application/xml... POSTing file gb18030-example.xml to [base] POSTing file hd.xml to [base] POSTing file ipod_other.xml to [base] POSTing file ipod_video.xml to [base] POSTing file manufacturers.xml to [base] POSTing file mem.xml to [base] POSTing file money.xml to [base] POSTing file monitor.xml to [base] POSTing file monitor2.xml to [base] POSTing file mp500.xml to [base] POSTing file sd500.xml to [base] POSTing file solr.xml to [base] POSTing file utf8-example.xml to [base] POSTing file vidcard.xml to [base] 14 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/techproducts/update... Time spent: 0:00:02.624 Solr techproducts example launched successfully. Direct your Web browser to http://localhost:8983/solr to visit the Solr Admin UI C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>
As you seen from the output, it created the core and loaded data from the example files.
4.6 Create Solr Core
We can use Solr Admin Console to create a core. However, in this step, I will create a Solr core via a command.
create_core -c films
C:\MaryZheng\DevTools\solr\solr-8.6.3\bin>solr.cmd create_core -c films WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use. To turn off: bin\solr config -c films -p 8983 -action set-user-property -property update.autoCreateFields -value false Created new core 'films'
Create another core: money
create_core -c money
C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >solr.cmd create_core -c money WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use. To turn off: bin\solr config -c money -p 8983 -action set-user-property -property update.autoCreateFields -value false Created new core 'money' C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >
Delete the core: money
delete -c money
C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >solr.cmd delete -c money Deleting core 'money' using command: http://localhost:8983/solr/admin/cores?action=UNLOAD&core=money&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >
4.7 Load Data
Solr provides several example documents. In this step, I will load the data into films core from the provided sample films.csv file.
First, go to C:\MaryZheng\DevTools\solr\solr-8.6.3\example\exampledocs> and enter the following Java command:
java -Dc=films -Dtype=text/csv -jar post.jar ..\films\films.csv
C:\MaryZheng\DevTools\solr\solr-8.6.3\example\exampledocs>java -Dc=films -Dtype=text/csv -jar post.jar ..\films\films.csv SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/films/update using content-type text/csv... POSTing file films.csv to [base] SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/films/update SimplePostTool: WARNING: Response: { "responseHeader":{ "status":400, "QTime":1006}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","java.lang.NumberFormatException"], "msg":"ERROR: [doc=/en/quien_es_el_senor_lopez] Error adding field 'name'='¿Quién es el señor López?' msg=For input string: \"¿Quién es el señor López?\"", "code":400}} SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/films/update 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/films/update... Time spent: 0:00:02.154
- line 3, 18 – Request handler to update the core data
- line 8 – 400 bad request response
We got a 400 bad request error. The error is caused by the data in the films.csv file. Open the file, the name column has 0.45 in the first row but the 5th row has a text value.
By default, Solr post.jar defines the data type based on the first row data value. so it defines the name field as a pdoubles type.
We can view the C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr\films\conf\manage-schema.xml. I copy the name field here.
<field name="name" type="pdoubles" >
We will use Schema browser to delete the name filed and re-add it back with the text_general type. After that, view the C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr\films\conf\manage-schema.xml. You will see the type value is updated.
<field name="name" type="text_general" uninvertible="true" indexed="true" stored="true" > </field >
After the schema is updated, then you can re-execute the command. This time, you should see the output as the following:
C:\MaryZheng\DevTools\solr\solr-8.6.3\example\exampledocs>java -Dc=films -Dtype=text/csv -jar post.jar ..\films\films.csv SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/films/update using content-type text/csv... POSTing file films.csv to [base] 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/films/update... Time spent: 0:00:02.250 C:\MaryZheng\DevTools\solr\solr-8.6.3\example\exampledocs>
At this moment, there is 1100 records in the films core. We will use these data to perform query later.
5. Solr Admin Console
Apache Solr provides a great admin console. You can access it from http://localhost:8983/.
5.1 Core Admin
In this step, I will open a web browser and navigate to http://localhost:8983/. You should see a Solar Admin console as the following screenshot.
As you seen here, you can view the server log at the Logging section.
5.2 Analysis
An analyzer examines the text of fields and generates a token stream. You can click Analysis under the selected core.
I typed “This is a simple math question, do you agree?” at the Field Value (Index). I entered “Math is fun.” at the Field Value (Query).
As you seen at Figure 7, the FieldType is text_general. It outputs both Analyse and Query results. It highlights the matching tokens: is and math.
5.3 Solr Search
Solr Admin console provides a very easy way to query the data from a Solr core. In this step, I will search the films core to find out any documents whose name filed contains David.
- Select the films from the Core Selector drop down box
- Click “Query”
- Note that the Request Handler is /select
- Enter “name:David” as Solr Query
- Enter “id desc” under sort to sort the results
- Enter “name, id, directed_by” under ft to only list these fields in the output results
- Select “json” from wt as the output format
5.4 Browse Example from techproducts
Solr techproducts example also provides a browse link : http://localhost:8983/solr/techproducts/browse/.
6. Restful Queries
Solr provides RestFul APIs for queries executed at the console. You can see the exact query at the top of the screen. Please pay attention to the query outline at the Figure 8.
6.1 Normal Query
In this step, I will use curl command to execute the exact same query in Figure 8: search the films core to find out any documents whose name filed contains David.
curl command
C:\MaryZheng\DevTools\solr\solr-8.6.3\bin >curl "http://localhost:8983/solr/films/select?fl=name%2Cid%2Cdirected_by&q=name%3ADavid&sort=id%20desc" { "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"name:David", "fl":"name,id,directed_by", "sort":"id desc"}}, "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[ { "name":["David & Layla"], "directed_by":["Jay Jonroy"], "id":"/en/david_layla"}, { "name":["David Gilmour in Concert"], "directed_by":["David Mallet"], "id":"/en/david_gilmour_in_concert"}] }}
Note: line 1 url is explained as following:
- http://{hostname:port}/solr – It’s the Solr server host name and port. It varies for each instance.
- /solr – it’s the constant value.
- /films – it’s the core name, used to search data from.
- /select – it’s the request handler for querying data.
- fl – it’s the Solr query parameter to list of the fields at the result sets.
- q – it’s the Solr query parameter to specify the query conditions.
- sort – it defines the result set’s sorting condition.
Note: line 10 – response data is explained as following:
- numFound – total number of record found.
- start – the start position.
- docs – the array of the documents
6.2 Query with Facet
Apache Solr provides faceting capacity at the result sets. You can access it via http://localhost:8983/solr/techproducts/browse. Here is an example from the techproducts core.
curl command
C:\MaryZheng\DevTools\solr\solr-8.6.3\server\solr >curl "http://localhost:8983/solr/techproducts/select?facet.field=cat&facet=on&q=price%3A%5B100%20TO%20200%5D" { "responseHeader":{ "status":0, "QTime":1, "params":{ "q":"price:[100 TO 200]", "facet.field":"cat", "facet":"on"}}, "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[ { "id":"TWINX2048-3200PRO", "name":"CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail", "manu":"Corsair Microsystems Inc.", "manu_id_s":"corsair", "cat":["electronics", "memory"], "features":["CAS latency 2, 2-3-3-6 timing, 2.75v, unbuffered, heat-spreader"], "price":185.0, "price_c":"185.00,USD", "popularity":5, "inStock":true, "store":"37.7752,-122.4232", "manufacturedate_dt":"2006-02-13T15:26:37Z", "payloads":"electronics|6.0 memory|3.0", "_version_":1683289868499681280, "price_c____l_ns":18500}, { "id":"0579B002", "name":"Canon PIXMA MP500 All-In-One Photo Printer", "manu":"Canon Inc.", "manu_id_s":"canon", "cat":["electronics", "multifunction printer", "printer", "scanner", "copier"], "features":["Multifunction ink-jet color photo printer", "Flatbed scanner, optical scan resolution of 1,200 x 2,400 dpi", "2.5\" color LCD preview screen", "Duplex Copying", "Printing speed up to 29ppm black, 19ppm color", "Hi-Speed USB", "memory card: CompactFlash, Micro Drive, SmartMedia, Memory Stick, Memory Stick Pro, SD Card, and MultiMediaCard"], "weight":352.0, "price":179.99, "price_c":"179.99,USD", "popularity":6, "inStock":true, "store":"45.19214,-93.89941", "_version_":1683289868634947584, "price_c____l_ns":17999}] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{ "cat":[ "electronics",2, "copier",1, "memory",1, "multifunction printer",1, "printer",1, "scanner",1, "camera",0, "connector",0, "currency",0, "electronics and computer1",0, "electronics and stuff2",0, "graphics card",0, "hard drive",0, "music",0, "search",0, "software",0]}, "facet_ranges":{}, "facet_intervals":{}, "facet_heatmaps":{}}}
- line 1 – includes facet=on and facet.field=cat to group the results sets by the cat field
- line 54 – facet results
7. Summary
That was an introduction about a Solr Project using Solr Core as a Search Engine. Apache Solr provides advanced full-text search capability.
In this example, I demonstrated the basic operations and how to use the admin console to query and analysis. I also showed few query examples via Restful APIs which can be consumed by any Rest client.
If you want to read more about Apache Solr, take a look here.
8. Download the Source Code
You can download the commands used in this example here: Solr Project using Solr Core as a Search Engine