Solr Multivalued Example
In this example of Solr Multivalued Example, we will discuss about how to index a field that contains multiple values and also demonstrate how to retrieve them. There is no difference in the way how Solr stores the index values for a single value field and multiple value field. But, while retrieving the multi value fields from the storage we will get the result as a list which need to be parsed to display multiple values.
To demonstrate the multi value feature, we will use the sample file “books.json” shipped with Solr server. Our preferred environment for this example is solr-5.0.0. Before you begin the Solr installation make sure you have JDK installed and Java_Home is set appropriately.
1. Install Apache Solr
To begin with, lets download the latest version of Apache Solr from the following location:
http://lucene.apache.org/solr/downloads.html
Apache Solr has gone through various changes from 4.x.x to 5.0.0, so if you have a different version of Solr you need to download the 5.x.x. version to follow this example.
Once the Solr zip file is downloaded, unzip it into a folder. The extracted folder will look like the below:
The bin
folder contains the scripts to start and stop the server. The example
folder contains few example files. We will be using one of them to demonstrate how Solr indexes the data. The server
folder contains the logs
folder where all the Solr logs are written. It will be helpful to check the logs for any error during indexing. The solr
folder under server holds different collection or core. The configuration and data for each of the core/ collection are stored in the respective core/ collection folder.
Apache Solr comes with an inbuilt Jetty server. But before we start the solr instance we must validate the JAVA_HOME is set on the machine.
We can start the server using the command line script. Lets go to the bin directory from the command prompt and issue the following command:
solr start
This will start the Solr server under the default port 8983.
We can now open the following URL in the browser and validate that our Solr instance is running. The specifics of solr admin tool is beyond the scope of the example.
http://localhost:8983/solr/
2. Create a Solr core
When the Solr server is started in Standalone mode, the configuration is called core and when it is started in SolrCloud mode, the configuration is called Collection. In this example we will discuss about the standalone server and core. We will park the SolrCloud discussion for later time.
First, we need to create a Core for indexing the data. The Solr create command has the following options:
- -c <name> – Name of the core or collection to create (required).
- -d <confdir> – The configuration directory, useful in the SolrCloud mode.
- -n <configName> – The configuration name. This defaults to the same name as the core or collection.
- -p <port> – Port of a local Solr instance to send the create command to; by default the script tries to detect the port by looking for running Solr instances.
- -s <shards> – Number of shards to split a collection into, default is 1.
- -rf <replicas> – Number of copies of each document in the collection. The default is 1.
In this example we will use the -c parameter for core name and -d parameter for the configuration directory. For all other parameters we make use of default settings.
Now navigate the solr-5.0.0\bin
folder in the command window and issue the following command:
solr create -c jcg -d basic_configs
We can see the following output in the command window.
Creating new core 'jcg' using command: http://localhost:8983/solr/admin/cores?action=CREATE&name=jcg&instanceDir=jcg { "responseHeader":{ "status":0, "QTime":663}, "core":"jcg"}
Now we navigate to the following URL and we can see jcg core being populated in the core selector. You can also see the statistics of the core.
http://localhost:8983/solr
3. Configure Multivalued field
Multivalued fields allows us to store more than one value in the same field. The source data containing multiple values for the same field or usage of copyField will compel us to use the multiValued field. Similar to the single value field configuration, we have to modify the schema.xml
file to add the multivalue attribute. Let’s navigate to server\solr\jcg\conf
folder and do the following configuration. Here, we have made the cat field to be multi valued.
schema.xml
<uniqueKey>id</uniqueKey> <!-- Added for Multi value example --> <field name="name" type="text_general" indexed="true" stored="true"/> <field name="cat" type="text_general" indexed="true" stored="true" multiValued="true"/> <field name="price" type="tdouble" indexed="true" stored="true"/> <field name="inStock" type="boolean" indexed="true" stored="true"/> <field name="author" type="text_general" indexed="true" stored="true"/>
- name – Name of the field stored and referred in Solr (required).
- type – The datatype of the field defined in the configuration (required).
- indexed – The attribute value specifies the field is used for indexing and the record can be retrieved using the index. Setting the value to false will make the field only stored but can’t be queried with. (Optional)
- stored – The attribute value specifies the field is stored and can be returned in the output. Setting this field to false will make the field only indexed and can’t be retrieved in output. (Optional)
- multiValued – If true, indicates that a single document might contain multiple values for this field type. (Optional)
Since we have modified the configuration we have to stop and start the server. To do so, we need to issue the following command from bin directory through command line:
solr stop -all
The server will be stopped now. Now to start the server issue the following command from bin directory through command line:
solr start
4. Index the data file
Apache Solr comes with a Standalone Java program called the SimplePostTool. This program is packaged into JAR and available with the installation under the folder example\exampledocs
.
Now we navigate to the example\exampledocs
folder in the command prompt and type the following command. You will see a bunch of options to use the tool.
java -jar post.jar -h
The usage format in general is as follows
Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg>
[<file|folder|url|arg>...]]
As we said earlier, we will index the data present in the “books.json” file shipped with Solr installation. We will navigate to the solr-5.0.0\example\exampledocs
in the command prompt and issue the following command.
java -Dtype=application/json -Durl=http://localhost:8983/solr/jcg/update -jar post.jar books.json
The SystemProperties used here are:
- -Dtype – the type of the data file.
- -Durl – URL for the jcg core.
The file “books.json” will now be indexed and the command prompt will display the following output.
SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/jcg/update using content-type application/json... POSTing file books.json to [base] 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/jcg/update... Time spent: 0:00:01.646
5. Query the data
Now open the following URL, you will see the cat field having multiple values.
http://localhost:8983/solr/jcg/select?q=*
We can also query on the multiValue field the same way we execute the query on the single value field. Open the following URL, the resultset will give the books that have hardcover option.
http://localhost:8983/solr/jcg/select?q=cat:hardcover
6. Download the Schema file
This was an example on Solr Multivalue field.
You can download the schema file used in this example here: schema.xml