Apache Solr

Apache Solr replication example

In this example of Solr replication example, we will show you how to set up replication in Apache Solr and demonstrate how a new record gets replicated from master to slave cores. For this example we will consider one master and two slave servers. In production environment we will use different machines for hosting the master and slave servers. Over here we will run both master and slave Solr servers on the same machine by using different ports.

Our preferred environment for this example is Windows. Before you begin the Solr installation make sure you have JDK installed and Java_Home is set appropriately.
 

1. Install Apache Solr

To begin with lets download the latest version of Apache Solr from the following location.

http://lucene.apache.org/solr/downloads.html

Apache Solr has gone through various changes from 4.x.x to 5.0.0, so if you have different version of Solr you need to download the 5.x.x. version to follow this example. Once the Solr zip file is downloaded unzip it into a folder. The extracted folder will look like the below.

solr_folder
Solr folders

The bin folder contains the scripts to start and stop the server. The example folder contains few example files. We will be using one of them to demonstrate how replication works. The server folder contains the logs folder where all the Solr logs are written. It will be helpful to check the logs for any error during indexing. The solr folder under server holds different collection or core. The configuration and data for each of the core/ collection are stored in the respective core/ collection folder.

Apache Solr comes with an inbuilt Jetty server. But before we start the solr instance we must validate the JAVA_HOME is set on the machine.

We can start the server using the command line script. Lets go to the bin directory from the command prompt and issue the following command

solr start

This will start the Solr server under the default port 8983.

We can now open the following URL in the browser and validate that our Solr instance is running. The specifics of solr admin tool is beyond the scope of the example.

http://localhost:8983/solr/

Solr admin console
Solr admin console

2. Configuring Solr – master

In this section, we will show you how to configure the master core for a Solr instance. Apache Solr ships with an option called Schemaless mode. This option allow users to construct effective schema without manually editing the schema file. For this example we will use the reference configset sample_techproducts_configs.

2.1 Creating master Core

First, we need to create a core for indexing the data. The Solr create command has the following options:

  • -c <name> – Name of the core or collection to create (required).
  • -d <confdir> – The configuration directory, useful in the SolrCloud mode.
  • -n <configName> – The configuration name. This defaults to the same name as the core or collection.
  • -p <port> – Port of a local Solr instance to send the create command to; by default the script tries to detect the port by looking for running Solr instances.
  • -s <shards> – Number of shards to split a collection into, default is 1.
  • -rf <replicas> – Number of copies of each document in the collection. The default is 1.

In this example we will use the -c parameter for core name, -rf parameter for replciation and -d parameter for the configuration directory.

Now navigate the solr-5.0.0\bin folder in the command window and issue the following command.

solr create -c master -d sample_techproducts_configs -p 8983 -rf 3

We can see the following output in the command window.

Creating new core 'master' using command:
http://localhost:8983/solr/admin/cores?action=CREATE&name=master&instanceDi
r=master

{
 "responseHeader":{
 "status":0,
 "QTime":1563},
 "core":"master"}

Now we can navigate to the following URL and see master core being populated in the core selector. You can also see the statistics of the core.

http://localhost:8983/solr/#/master

solrreplication_master
master console

2.2 Modify solrconfig

Open the file solrconfig.xml under the folder server\solr\master\conf and add the configuration for the master under the requestHandler tag. We will set the values for replicateAfter and backAfter to optimize. The confFiles parameter value is set according to the slave collection name we are going to create.

solrconfig.xml

     <!-- Replication Handler -->
     <requestHandler name="/replication" class="solr.ReplicationHandler" >
          <lst name="master">
               <str name="replicateAfter">optimize</str>
               <str name="backupAfter">optimize</str>
               <str name="confFiles">solrconfig_slave.xml:solrconfig.xml,x.xml,y.xml</str>
               <str name="commitReserveDuration">00:00:10</str>
          </lst>    
          <int name="maxNumberOfBackups">2</int>
          <lst name="invariants">
               <str name="maxWriteMBPerSec">16</str>
          </lst>
     </requestHandler>

Since we have modified the solrconfig we have to restart the solr server. Issue the following commands in the command window navigating to solr-5.0.0\bin.

solr stop -all

solr start

3. Configuring Solr – slave

For this example, we will create two slave cores. The data from the master core will get replicated into both slaves. We will run the two slaves on the same machine with different ports along with the master core. To do so, extract another copy of solr server to a folder called solr1. Navigate to the solr-5.0.0\bin folder of solr1 in the command window and issue the following command.

solr start -p 9000

The -p option will start the solr server in a different port. For the first slave we will use port 9000.
Now navigate to the solr-5.0.0\bin folder of the slave in the command window and issue the following command.

solr create -c slave -d sample_techproducts_configs -p 9000

We can see the following output in the command window.

Creating new core 'slave' using command:
http://localhost:9000/solr/admin/cores?action=CREATE&name=slave&instanceDir=slave

{
 "responseHeader":{
 "status":0,
 "QTime":1778},
 "core":"slave"}

Now open the file solrconfig.xml under the folder server\solr\slave\conf and add the configuration for the slave under the requestHandler tag. In the configuration we will point the slave to the masterUrl for replication. The pollInterval is set to 20 seconds. It is the time difference between two poll requests made by the slave.

solrconfig.xml

  <!-- Replication Handler -->
     <requestHandler name="/replication" class="solr.ReplicationHandler" >
          <lst name="slave">
               <!--fully qualified url for the replication handler of master. It is possible
               to pass on this as
               a request param for the fetchindex command-->
               <str name="masterUrl">http://localhost:8983/solr/master/replication</str>
               <!--Interval in which the slave should poll master .Format is HH:mm:ss . If
               this is absent slave does not
               poll automatically.
               But a fetchindex can be triggered from the admin or the http API -->
               <str name="pollInterval">00:00:20</str>
          </lst>
     </requestHandler>

Since we have modified the solrconfig we have to restart the solr server. Issue the following commands in the command window navigating to solr-5.0.0\bin.

solr stop -all

solr start -p 9000

Now open the slave console using the following URL. The replication section will show the configuration reflecting the configuration we made in the solrconfig.

http://localhost:9000/solr/#/slave/replication

solrreplication_slave
slave-1 replication console

To create another slave server, follow the same steps and configure the server in port 9001. We can now open the console using the following URL and validate the configuration in the replication section.

http://localhost:9001/solr/#/slave/replication

solrreplication_slave2
slave-2 replication console

4. Indexing and Replication

Now we will index the example data pointing to the master core. Apache Solr comes with a Standalone Java program called the SimplePostTool. This program is packaged into JAR and available with the installation under the folder example\exampledocs.

Now we navigate to the example\exampledocs folder in the command prompt and type the following command. You will see a bunch of options to use the tool.

java -jar post.jar -h

The usage format in general is as follows
Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg>
[<file|folder|url|arg>...]]

As we said earlier, we will index the data present in the “books.csv” file shipped with Solr installation. We will navigate to the solr-5.0.0\example\exampledocs in the command prompt and issue the following command.

java -Dtype=text/csv -Durl=http://localhost:8983/solr/master/update -jar post.jar  books.csv

The SystemProperties used here are:

  • -Dtype – the type of the data file.
  • -Durl – URL for the jcg core.

The file “books.csv” will now be indexed and the command prompt will display the following output.

SimplePostTool version 5.0.0
 Posting files to [base] url http://localhost:8983/solr/master/update using content-type text/csv...
 POSTing file books.csv to [base]
 1 files indexed.
 COMMITting Solr index changes to http://localhost:8983/solr/master/update...
 Time spent: 0:00:00.604

Now open the console of the slave cores and we can see the data replicated automatically.

http://localhost:9000/solr/#/slave

solrreplication_slave_data
slave console – data replicated

5. Add new record

Now we validate the replication further by adding a record to the master core. To do it, lets open the master console URL.

http://localhost:8983/solr/#/master/documents

Navigate to the documents section and choose the document type as CSV and input the following content into the document text area and click on Submit.

id,cat,name,price,inStock,author,series_t,sequence_i,genre_s
123,book,Apache Solr,6.99,TRUE,Veera,JCG,1,Technical

solrreplication_insert
master console – add new record

The data will be added to master core and get replicated to the slave servers. To validate it lets navigate to the slave core. We can find the count of documents getting increased to 11. We can also use the query section in the slave admin console to validate it. Open the following URL.

http://localhost:9000/solr/#/slave/query

Input the values name:apache in the q text area and click on Execute Query. The new record we inserted on the master core will get reflected in the slave core.

solrreplication_output
slave console – query

6. Download the Configuration

This was an example of Apache Solr replication.

Download
You can download the master configuration here: solrconfig master and slave configuration here: solrconfig slave

Veeramani Kalyanasundaram

Veera is a Software Architect working in telecom domain with rich experience in Java Middleware Technologies. He is a OOAD practitioner and interested in Performance Engineering.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Bruno
Bruno
5 years ago

Nice tutorial!

Back to top button