Apache Solr Function Query Example
1. Introduction
In this example, we are going to explain what the Apache Solr Function Query is and how to use it in queries against our sample articles collection.
2. Technologies Used
The steps and commands described in this example are for Apache Solr 8.5 on Windows 10. The JDK version we use to run the SolrCloud in this example is OpenJDK 13.
Before we start, please make sure your computer meets the system requirements. Also, please download the binary release of Apache Solr 8.5.
In addition, it will save you some time if you can follow the Apache Solr Clustering Example to get a SolrCloud up and running on your local machine.
3. Function Queries Basics
When searching something in Solr, a common way is to specify terms as keywords in a query. The relevance score of each matching document in the search results is then calculated based on the terms’ TF-IDF similarity. The relevance score is used to describe the degree in which a search result satisfies a user searching for information. The higher the relevance score the better a user’s requirement is met. Is there a way for us to generate relevance scores by using our custom calculation in a query so that the search results can satisfy our users’ needs in different contexts? Function queries are introduced for this purpose.
3.1 What Is A Function Query
A function query is a special query that can be added to a query and it allows us to specify a function to generate a relevance score at query time for each document in the search results. In addition, the calculated score can then be used to filter out documents, sort results, and append as a field for each document returned.
3.2 Query Parsers Supporting Function Queries
The following query parsers support function queries:
- The Standard Query Parser
- The DisMax Query Parser
- The Extended DisMax (eDismax) Query Parser
- The Function Query Parser
- Function Range Query Parser
3.3. Function Syntax
Function queries use functions. The standard function syntax in Solr consists of a function name, an opening-round bracket, a list of parameters, and a closing round bracket.
numdocs() ord(myIndexedField) max(myfield,myotherfield,0)
In addition to the standard function syntax, there are three simplified function syntax as below:
- A constant (a numeric or string literal)
18, 3.1415, "Function Query Example"
- A field
author, field(author)
- A parameter substitution
q={!func}max($f1,$f2)&f1=views&f2=1000
Note that Solr defines the input parameter types of a function as functions themselves. It means we can pass a function as a parameter of another function.
4. Solr Function Queries Examples
There are several ways of using function queries in a Solr query. Before we show you some examples, let us prepare the collection and data for our queries.
4.1 Upload A Configset
Before creating a collection to indexing our data, we need a configset for our collection. A configset is a collection of configuration files such as solrconfig.xml, synonyms.txt, the schema, etc.. There are two example configsets (_default
and sample_techproducts_configs
) in the Solr distribution which can be used when creating collections.
Note that when running in SolrCloud mode, configsets are fundamentally stored in ZooKeeper and not the file system. Solr’s _default
configset is uploaded to ZooKeeper on initialization. So to use our own configset, we need to create a new one and upload it to ZooKeeper.
For this example, we create our own configset jcg_example_configs
for our collection simply by making a copy of the _default configset. Download the source code of this example and copy jcg_example_configs.zip to your working directory. For example, we copy jcg_example_configs.zip
to D:\
on Windows 10. Then run the following command in a Command Prompt to upload a configset:
curl -X POST --header "Content-Type:application/octet-stream" --data-binary @jcg_example_configs.zip "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=jcg_example_configs"
We can see the output as below:
D:\>curl -X POST --header "Content-Type:application/octet-stream" --data-binary @jcg_example_configs.zip "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=jcg_example_configs" { "responseHeader":{ "status":0, "QTime":2203}}
If the jcg_example_config
configset has already existed, you can delete it with the following command:
curl -X DELETE http://localhost:8983/api/cluster/configs/jcg_example_configs?omitHeader=true
Now we can use the Configsets API to list all configsets on the SolrCloud:
curl http://localhost:8983/solr/admin/configs?action=LIST
There are two configsets in the response:
D:\>curl http://localhost:8983/solr/admin/configs?action=LIST { "responseHeader":{ "status":0, "QTime":1}, "configSets":["jcg_example_configs", "_default"]}
4.2 Indexing Data
Assuming that you have followed the steps in Apache Solr Clustering Example to get a SolrCloud up and running on your local machine. Open Solr Admin in a browser and create a new collection named jcgArticles
with jcg_example_configs
config set. Select the newly created jcgArticles
collection and go to Documents
screen, Copy the content of articles.csv
file downloaded from this example and paste into Documents
text box. Select CSV
from the drop-down list as Document Type
and click Submit Document
button.
You will see the following output once documents have been submitted successfully.
Status: success Response: { "responseHeader": { "rf": 2, "status": 0, "QTime": 467 } }
4.3 Querying Without Using A Function Query
We search for articles whose title contains term SolrCloud
by using a field query. Also, we add score
to the fields list of the search results. Later, We will compare the relevance scores returned by this query to the relevance scores of other queries using function queries.
q=title:*SolrCloud*&fl=*,score
Click the Execute Query
button and the output would be:
{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":104, "params":{ "q":"title:*SolrCloud*", "fl":"*,score", "wt":"json", "_":"1592054831147"}}, "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[ { "id":"0818231712", "category":["solr"], "title":["Apache SolrCloud Example"], "published":true, "author":["Kevin Yang"], "views":2000, "likes":1000, "dislikes":10, "comments":200, "publish_date":"2020-06-05T00:00:00Z", "_version_":1669390419809533952, "score":1.0}, { "id":"0380014300", "category":["solr"], "title":["SolrCloud Tutorial"], "published":true, "author":["Roger Goodwill"], "views":2000, "likes":1000, "dislikes":500, "comments":10, "publish_date":"2020-06-05T00:00:00Z", "_version_":1669390419821068288, "score":1.0}] }}
As we can see from the output above, there are 2
articles found. Both of them with a score of 1.0 and the number of views is the same 2000. How do we know which article is more popular? You may notice that these two articles have a different number of dislikes. So we can define the popularity of an article as below:
popularity = views / dislikes
It means if two articles have the same number of views, then the article with less dislikes is more popular than the other one. Let’s see how we can implement this popularity calculation and use it with a function query.
4.4 Querying With A Function Query
Query parsers such as func
and frange
expect function arguments. We can use the built-in Solr function div to calculate the popularity. For example:
q=title:*SolrCloud* AND _query_:"{!func}div(views,dislikes)"&fq={!frange l=1}dislikes&fl=*,score
In this query, we add {!func}div(views,dislikes)
function to the query and include the score in the fields list returned. In addition, fq={!frange l=1}dislikes are used to avoid division by zero issues. The output would be:
{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":97, "params":{ "q":"title:*SolrCloud* AND _query_:\"{!func}div(views,dislikes)\"", "fl":"*,score", "fq":"{!frange l=1}dislikes", "wt":"json", "_":"1592054952916"}}, "response":{"numFound":2,"start":0,"maxScore":201.0,"docs":[ { "id":"0818231712", "category":["solr"], "title":["Apache SolrCloud Example"], "published":true, "author":["Kevin Yang"], "views":2000, "likes":1000, "dislikes":10, "comments":200, "publish_date":"2020-06-05T00:00:00Z", "_version_":1669390419809533952, "score":201.0}, { "id":"0380014300", "category":["solr"], "title":["SolrCloud Tutorial"], "published":true, "author":["Roger Goodwill"], "views":2000, "likes":1000, "dislikes":500, "comments":10, "publish_date":"2020-06-05T00:00:00Z", "_version_":1669390419821068288, "score":5.0}] }}
Now we can see that the relevance scores have been updated. The first article has a score of 201.0 and the second article has a score 5.0
. Obviously, the first article is more popular than the second one based on our popularity definition.
The query above can be written via the _val_
keyword as well:
q=title:*SolrCloud* AND _val_:"div(views,dislikes)"&fq={!frange l=1}dislikes&fl=*,score
It yields the same output:
{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":50, "params":{ "q":"title:*SolrCloud* AND _val_:\"div(views,dislikes)\"", "fl":"*,score", "fq":"{!frange l=1}dislikes", "wt":"json", "_":"1592054952916"}}, "response":{"numFound":2,"start":0,"maxScore":201.0,"docs":[ { "id":"0818231712", "category":["solr"], "title":["Apache SolrCloud Example"], "published":true, "author":["Kevin Yang"], "views":2000, "likes":1000, "dislikes":10, "comments":200, "publish_date":"2020-06-05T00:00:00Z", "_version_":1669390419809533952, "score":201.0}, { "id":"0380014300", "category":["solr"], "title":["SolrCloud Tutorial"], "published":true, "author":["Roger Goodwill"], "views":2000, "likes":1000, "dislikes":500, "comments":10, "publish_date":"2020-06-05T00:00:00Z", "_version_":1669390419821068288, "score":5.0}] }}
4.5 Using Function Query In A Sort Expression
Function queries can be used in a sort expression. For example:
q=title:*SolrCloud*&fq={!frange l=1}dislikes&fl=*,score&sort=div(views,dislikes) desc, score desc
In this query, instead of using our popularity function for the relevance score, we just add it in the sort expression to sort the results by the popularity in descending order. The output would be:
{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":72, "params":{ "q":"title:*SolrCloud*", "fl":"*,score", "fq":"{!frange l=1}dislikes", "sort":"div(views,dislikes) desc, score desc", "wt":"json", "_":"1592061341139"}}, "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[ { "id":"0818231712", "category":["solr"], "title":["Apache SolrCloud Example"], "published":true, "author":["Kevin Yang"], "views":2000, "likes":1000, "dislikes":10, "comments":200, "publish_date":"2020-06-05T00:00:00Z", "_version_":1669390419809533952, "score":1.0}, { "id":"0380014300", "category":["solr"], "title":["SolrCloud Tutorial"], "published":true, "author":["Roger Goodwill"], "views":2000, "likes":1000, "dislikes":500, "comments":10, "publish_date":"2020-06-05T00:00:00Z", "_version_":1669390419821068288, "score":1.0}] }}
We can see that the relevance scores remain the same but the article with higher popularity value is put in front.
4.6 Adding Function Results As Fields Of Documents In Search Results
Another useful scenario is to add the calculation results as fields of documents in the search results. For instance:
q=title:*SolrCloud*&fq={!frange l=1}dislikes&fl=id,title,author,views,dislikes,score,popularity:div(views,dislikes)
In this query, we add a pseudo-field popularity:div(views,dislikes)
to the fields list. The output would be:
{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":84, "params":{ "q":"title:*SolrCloud*", "fl":"id,title,author,views,dislikes,score,popularity:div(views,dislikes)", "fq":"{!frange l=1}dislikes", "wt":"json", "_":"1592061341139"}}, "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[ { "id":"0818231712", "title":["Apache SolrCloud Example"], "author":["Kevin Yang"], "views":2000, "dislikes":10, "popularity":200.0, "score":1.0}, { "id":"0380014300", "title":["SolrCloud Tutorial"], "author":["Roger Goodwill"], "views":2000, "dislikes":500, "popularity":4.0, "score":1.0}] }}
There is a list of available Function Queries available here. Also, you can implement your own custom functions and use them in the query which is out of the scope of this example.
5. Download the Sample Configuration and Data Files
You can download the sample configuration and data files of this example here: Apache Solr Function Query Example