lucene

Lucene Query Parser Example

In this Example , we are going to learn about Lucene QueryParser class. Here, we go through the simple and fundamental concepts with the QueryParser Class. In my previous post, we went through the indexing, writing, searching and displaying steps for the indexing example. Here we go through the searching step i.e. more specifically the concepts with the QueryParser Class. Thus, this post aims to demonstrate you with different searching option and features that lucence facilitates through use the QueryParser class from lucene.

The code in this example is developed in the NetBeans IDE 8.0.2. In this example, the lucene used is lucene version 4.2.1. You would better try this one with the latest versions always.

Figure 1. Lucene Library Jars
Figure 1. Lucene Library Jars

1. QueryParser Class

QueryParser Class is the basic Class defined in Lucene Core particularly specialized for direct use for parsing queries and maintaining the queries. Different methods are available in the QueryParser Class so that we can easily go with the searching tasks using a wide range of searching options provided by the Lucene.

QueryParser is almost like a lexer that can interpret any sort of valid QueryString into a Lucence query. So, the queryString as an input from us is interpreted as the query command that the lucence is meant to understand and execute the command. It is the vital part of Lucence. As it is a lexer , it is to deal with grammar. And for grammar, query language or query syntax is the main thing to issue with.

2. Query

A Lucene query is meant to be built with lucene terms and operators. A Query is a series of clauses. A clause may be either a term, indicating all the documents that contain this term; or a nested query, enclosed in parentheses. A clause may be prefixed by a +/- sign, indicating that the clause is required or prohibited respectively; or a term followed by a colon, indicating the field to be searched. So, we can even construct queries which search multiple fields.

Thus, in BNF, the query grammar is:

Query ::= ( Clause )*
Clause ::= ["+", "-"] [ ":"] ( | "(" Query ")" )

2.1 Terms

Terms can be either single terms or phrases. A single term refers only single word while the phrase refers to the group of words under double quotes. Multiple terms can also arise to generate more complex query using operators.

2.2 Fields

A Lucene query can be made field specific. We can specify a field or a default field can also be used.

Example: title:"Lucence Introduction"

2.3 Boolean Operators

Lucene supports AND, “+”, NOT, OR and “-” as Boolean operators.

3. Term Modifiers

Lucence query does provide a wide range of searching options to facilitate easy searching. Lucence query supports the feature of Wildcard Searches, Regular Expression Searches, Range Searches, Fuzzy Searches, Proximity Searches and likewise.

3.1 Wildcard Searches

  • te?t to search for “text” or “test”
  • test* to search for test, testing or tester
  • te*t wildcard searches in the middle of a term
String querystr = "test*";
        Query q = new QueryParser(Version.LUCENE_42, "title", analyzer).parse(querystr);

Note: * or ? symbol cannot be used as the first character of a search.

3.2 Regular Expression Searches

  • /[tr]oat/ to search “test” or “rest”

3.3 Range Searches

  • title:{Java TO Lucene} to search documents whose titles are between Java and Lucene, but not including Java and Lucene.
  • title:[Java TO Lucene] including Java and Lucene.

3.4 Fuzzy Searches

  • /test~ This search will find terms like tests , rests and likewise.

3.5 Proximity Searches

  • "lucene apache"~10 to search for a “apache” and “lucene” within 10 words of each other in a document.

3.6 Boosting a Term

  • lucene^4 apache makes lucene more relevant than apache by the boost factor of 4 for our search.

4. Parsing Queries

QueryParser class is generated by JavaCC. The most important method in the QueryParser class is parse(String).

 public Query parse(String query)
            throws ParseException

Parses a query string, returning a Query.

Parameters: query – the query string to be parsed.

Throws: ParseException – if the parsing fails

String querystr = "test*";
        Query q = new QueryParser(Version.LUCENE_42, "title", analyzer).parse(querystr);
        int hitsPerPage = 10;
        IndexReader reader = DirectoryReader.open(index);
        IndexSearcher searcher = new IndexSearcher(reader);
        TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
        searcher.search(q, collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;

Try this example with the Lucene Example code. Here, the searcher searches the words like test and testing in the document for the querystring test*.

5.Constructors and Methods

5.1 Constructors

  • protected QueryParser(CharStream stream) Constructor with given CharStream.
  • protected QueryParser(QueryParserTokenManager tm) Constructor with generated Token Manager.
  • QueryParser(Version matchVersion, String f, Analyzer a) Create a query parser with given stringQuery

5.2 Some Other Methods

  • protected QueryParser(CharStream stream) Constructor with given CharStream.
  • void disable_tracing() Disable tracing.
  • ParseException generateParseException() Generate ParseException.
  • Token getToken(int index) Get the specific Token.
  • Token getNextToken(): Get the next Token.
  • void ReInit(QueryParserTokenManager tm)Reinitialise.
  • Query Term(String field) Generate query for the string
  • Query TopLevelQuery(String field) Generate top-level-query

6. Not to forget or things to consider

  1. You should seriously consider building your queries directly with the query API. In other words, the query parser is designed for human-entered text, not for program-generated text.
  2. In a query form, fields which are general text should use the query parser. All others, such as date ranges, keywords, etc. are better added directly through the query API. A field with a limit set of values, that can be specified with a pull-down menu should not be added to a query string which is subsequently parsed, but rather added as a TermQuery clause.
  3. You need to include both jar files of lucene-analyzers-common-x.x.x and lucene-queryparser-x.x.x along with lucene-core jar files to go with above examples.

7. Download the Eclipse project

Download
You can download the full source code of the example here: Lucene Example Code

Niranjan Acharya

I am a Software Engineering Graduate from Gandaki College of Engineering and Science, Nepal. I have been involving onto different software activities and projects in the four-year tenure. I started with programming in C and C++. I presented some presentations and exhibitions with C games and allegro gaming in GCES IT Mohatsav. I participated in different academic activities for working with Java, Web Technologies, Enterprise application and Big Data Technologies. With the completion of my Software engineering Graduation, I am working as Chief Technical officer in IT Sahayatri Private Limited.
Back to top button