Apache Solr

Apache Solr Function Query Example

1. Introduction

In this example, we are going to explain what the Apache Solr Function Query is and how to use it in queries against our sample articles collection.

2. Technologies Used

The steps and commands described in this example are for Apache Solr 8.5 on Windows 10. The JDK version we use to run the SolrCloud in this example is OpenJDK 13.

Before we start, please make sure your computer meets the system requirements. Also, please download the binary release of Apache Solr 8.5.
In addition, it will save you some time if you can follow the Apache Solr Clustering Example to get a SolrCloud up and running on your local machine.

3. Function Queries Basics

When searching something in Solr, a common way is to specify terms as keywords in a query. The relevance score of each matching document in the search results is then calculated based on the terms’ TF-IDF similarity. The relevance score is used to describe the degree in which a search result satisfies a user searching for information. The higher the relevance score the better a user’s requirement is met. Is there a way for us to generate relevance scores by using our custom calculation in a query so that the search results can satisfy our users’ needs in different contexts? Function queries are introduced for this purpose.

3.1 What Is A Function Query

A function query is a special query that can be added to a query and it allows us to specify a function to generate a relevance score at query time for each document in the search results. In addition, the calculated score can then be used to filter out documents, sort results, and append as a field for each document returned.

3.2 Query Parsers Supporting Function Queries

The following query parsers support function queries:

3.3. Function Syntax

Function queries use functions. The standard function syntax in Solr consists of a function name, an opening-round bracket, a list of parameters, and a closing round bracket.

numdocs()
ord(myIndexedField)
max(myfield,myotherfield,0)

In addition to the standard function syntax, there are three simplified function syntax as below:

  • A constant (a numeric or string literal)
18, 3.1415, "Function Query Example"
  • A field
author, field(author)
  • A parameter substitution
q={!func}max($f1,$f2)&f1=views&f2=1000

Note that Solr defines the input parameter types of a function as functions themselves. It means we can pass a function as a parameter of another function.

4. Solr Function Queries Examples

There are several ways of using function queries in a Solr query. Before we show you some examples, let us prepare the collection and data for our queries.

4.1 Upload A Configset

Before creating a collection to indexing our data, we need a configset for our collection. A configset is a collection of configuration files such as solrconfig.xml, synonyms.txt, the schema, etc.. There are two example configsets (_default and sample_techproducts_configs) in the Solr distribution which can be used when creating collections.

Note that when running in SolrCloud mode, configsets are fundamentally stored in ZooKeeper and not the file system. Solr’s _default configset is uploaded to ZooKeeper on initialization. So to use our own configset, we need to create a new one and upload it to ZooKeeper.

For this example, we create our own configset jcg_example_configs for our collection simply by making a copy of the _default configset. Download the source code of this example and copy jcg_example_configs.zip to your working directory. For example, we copy jcg_example_configs.zip to D:\ on Windows 10. Then run the following command in a Command Prompt to upload a configset:

curl -X POST --header "Content-Type:application/octet-stream" --data-binary @jcg_example_configs.zip "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=jcg_example_configs"

We can see the output as below:

D:\>curl -X POST --header "Content-Type:application/octet-stream" --data-binary @jcg_example_configs.zip "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=jcg_example_configs"
{
  "responseHeader":{
    "status":0,
    "QTime":2203}}

If the jcg_example_config configset has already existed, you can delete it with the following command:

curl -X DELETE http://localhost:8983/api/cluster/configs/jcg_example_configs?omitHeader=true

Now we can use the Configsets API to list all configsets on the SolrCloud:

curl http://localhost:8983/solr/admin/configs?action=LIST

There are two configsets in the response:

D:\>curl http://localhost:8983/solr/admin/configs?action=LIST
{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "configSets":["jcg_example_configs",
    "_default"]}

4.2 Indexing Data

Assuming that you have followed the steps in Apache Solr Clustering Example to get a SolrCloud up and running on your local machine. Open Solr Admin in a browser and create a new collection named jcgArticles with jcg_example_configs config set. Select the newly created jcgArticles collection and go to Documents screen, Copy the content of articles.csv file downloaded from this example and paste into Documents text box. Select CSV from the drop-down list as Document Type and click Submit Document button.

Solr Function Query - Preparing Data
Fig. 1. Preparing Data

You will see the following output once documents have been submitted successfully.

 Status: success
Response:
{
  "responseHeader": {
    "rf": 2,
    "status": 0,
    "QTime": 467
  }
}

4.3 Querying Without Using A Function Query

We search for articles whose title contains term SolrCloud by using a field query. Also, we add score to the fields list of the search results. Later, We will compare the relevance scores returned by this query to the relevance scores of other queries using function queries.

 q=title:*SolrCloud*&fl=*,score
Solr Function Query - Querying Without Using A Function Query
Fig. 2. Querying Without Using A Function Query

Click the Execute Query button and the output would be:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":104,
    "params":{
      "q":"title:*SolrCloud*",
      "fl":"*,score",
      "wt":"json",
      "_":"1592054831147"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"0818231712",
        "category":["solr"],
        "title":["Apache SolrCloud Example"],
        "published":true,
        "author":["Kevin Yang"],
        "views":2000,
        "likes":1000,
        "dislikes":10,
        "comments":200,
        "publish_date":"2020-06-05T00:00:00Z",
        "_version_":1669390419809533952,
        "score":1.0},
      {
        "id":"0380014300",
        "category":["solr"],
        "title":["SolrCloud Tutorial"],
        "published":true,
        "author":["Roger Goodwill"],
        "views":2000,
        "likes":1000,
        "dislikes":500,
        "comments":10,
        "publish_date":"2020-06-05T00:00:00Z",
        "_version_":1669390419821068288,
        "score":1.0}]
  }}

As we can see from the output above, there are 2 articles found. Both of them with a score of 1.0 and the number of views is the same 2000. How do we know which article is more popular? You may notice that these two articles have a different number of dislikes. So we can define the popularity of an article as below:

popularity = views / dislikes

It means if two articles have the same number of views, then the article with less dislikes is more popular than the other one. Let’s see how we can implement this popularity calculation and use it with a function query.

4.4 Querying With A Function Query

Query parsers such as func and frange expect function arguments. We can use the built-in Solr function div to calculate the popularity. For example:

q=title:*SolrCloud* AND _query_:"{!func}div(views,dislikes)"&fq={!frange l=1}dislikes&fl=*,score

In this query, we add {!func}div(views,dislikes) function to the query and include the score in the fields list returned. In addition, fq={!frange l=1}dislikes are used to avoid division by zero issues. The output would be:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":97,
    "params":{
      "q":"title:*SolrCloud* AND _query_:\"{!func}div(views,dislikes)\"",
      "fl":"*,score",
      "fq":"{!frange l=1}dislikes",
      "wt":"json",
      "_":"1592054952916"}},
  "response":{"numFound":2,"start":0,"maxScore":201.0,"docs":[
      {
        "id":"0818231712",
        "category":["solr"],
        "title":["Apache SolrCloud Example"],
        "published":true,
        "author":["Kevin Yang"],
        "views":2000,
        "likes":1000,
        "dislikes":10,
        "comments":200,
        "publish_date":"2020-06-05T00:00:00Z",
        "_version_":1669390419809533952,
        "score":201.0},
      {
        "id":"0380014300",
        "category":["solr"],
        "title":["SolrCloud Tutorial"],
        "published":true,
        "author":["Roger Goodwill"],
        "views":2000,
        "likes":1000,
        "dislikes":500,
        "comments":10,
        "publish_date":"2020-06-05T00:00:00Z",
        "_version_":1669390419821068288,
        "score":5.0}]
  }}

Now we can see that the relevance scores have been updated. The first article has a score of 201.0 and the second article has a score 5.0. Obviously, the first article is more popular than the second one based on our popularity definition.

The query above can be written via the _val_ keyword as well:

q=title:*SolrCloud* AND _val_:"div(views,dislikes)"&fq={!frange l=1}dislikes&fl=*,score

It yields the same output:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":50,
    "params":{
      "q":"title:*SolrCloud* AND _val_:\"div(views,dislikes)\"",
      "fl":"*,score",
      "fq":"{!frange l=1}dislikes",
      "wt":"json",
      "_":"1592054952916"}},
  "response":{"numFound":2,"start":0,"maxScore":201.0,"docs":[
      {
        "id":"0818231712",
        "category":["solr"],
        "title":["Apache SolrCloud Example"],
        "published":true,
        "author":["Kevin Yang"],
        "views":2000,
        "likes":1000,
        "dislikes":10,
        "comments":200,
        "publish_date":"2020-06-05T00:00:00Z",
        "_version_":1669390419809533952,
        "score":201.0},
      {
        "id":"0380014300",
        "category":["solr"],
        "title":["SolrCloud Tutorial"],
        "published":true,
        "author":["Roger Goodwill"],
        "views":2000,
        "likes":1000,
        "dislikes":500,
        "comments":10,
        "publish_date":"2020-06-05T00:00:00Z",
        "_version_":1669390419821068288,
        "score":5.0}]
  }}

4.5 Using Function Query In A Sort Expression

Function queries can be used in a sort expression. For example:

q=title:*SolrCloud*&fq={!frange l=1}dislikes&fl=*,score&sort=div(views,dislikes) desc, score desc

In this query, instead of using our popularity function for the relevance score, we just add it in the sort expression to sort the results by the popularity in descending order. The output would be:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":72,
    "params":{
      "q":"title:*SolrCloud*",
      "fl":"*,score",
      "fq":"{!frange l=1}dislikes",
      "sort":"div(views,dislikes) desc, score desc",
      "wt":"json",
      "_":"1592061341139"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"0818231712",
        "category":["solr"],
        "title":["Apache SolrCloud Example"],
        "published":true,
        "author":["Kevin Yang"],
        "views":2000,
        "likes":1000,
        "dislikes":10,
        "comments":200,
        "publish_date":"2020-06-05T00:00:00Z",
        "_version_":1669390419809533952,
        "score":1.0},
      {
        "id":"0380014300",
        "category":["solr"],
        "title":["SolrCloud Tutorial"],
        "published":true,
        "author":["Roger Goodwill"],
        "views":2000,
        "likes":1000,
        "dislikes":500,
        "comments":10,
        "publish_date":"2020-06-05T00:00:00Z",
        "_version_":1669390419821068288,
        "score":1.0}]
  }}

We can see that the relevance scores remain the same but the article with higher popularity value is put in front.

4.6 Adding Function Results As Fields Of Documents In Search Results

Another useful scenario is to add the calculation results as fields of documents in the search results. For instance:

q=title:*SolrCloud*&fq={!frange l=1}dislikes&fl=id,title,author,views,dislikes,score,popularity:div(views,dislikes)

In this query, we add a pseudo-field popularity:div(views,dislikes) to the fields list. The output would be:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":84,
    "params":{
      "q":"title:*SolrCloud*",
      "fl":"id,title,author,views,dislikes,score,popularity:div(views,dislikes)",
      "fq":"{!frange l=1}dislikes",
      "wt":"json",
      "_":"1592061341139"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"0818231712",
        "title":["Apache SolrCloud Example"],
        "author":["Kevin Yang"],
        "views":2000,
        "dislikes":10,
        "popularity":200.0,
        "score":1.0},
      {
        "id":"0380014300",
        "title":["SolrCloud Tutorial"],
        "author":["Roger Goodwill"],
        "views":2000,
        "dislikes":500,
        "popularity":4.0,
        "score":1.0}]
  }}

There is a list of available Function Queries available here. Also, you can implement your own custom functions and use them in the query which is out of the scope of this example.

5. Download the Sample Configuration and Data Files

Download
You can download the sample configuration and data files of this example here: Apache Solr Function Query Example

Kevin Yang

A software design and development professional with seventeen years’ experience in the IT industry, especially with Java EE and .NET, I have worked for software companies, scientific research institutes and websites.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button