MySQL Full-text Search Tutorial
1. Introduction
MySQL is one of the most used SQL databases in many of the world-enterprise applications due to its ability to manage transactions and the ACID-behaviour which is built into its core. One of the reasons MySQL is so popular is due to the easiness it provides for its usage and its related commands. In the open source web application framework LAMP (which consist of Linux, Apache, MySQL and PHP), MySQL server is a central & important component. The MySQL Database server is written using C and C++ which internally uses a lexical analyser to parse and understand the SQL queries.
With time, many full-text search databases have come up with fast indexes which are exclusively made for supporting scalable and efficient full-text search functionality like Elasticsearch, Apache Solr and MongoDB as well. That ease when you start typing something in a search box and results start to follow as you type, it is one of the implementations for full-text search.
Table Of Contents
- 1. Introduction
- 2. MySQL Setup
- 3. Full-text Search with MySQL
- 4. Defining Full-text indexes in MySQL
- 5. Introduction to Full-text searches in MySQL
- 6. Boolean Full-text searches in MySQL
- 7. Query Expansion with MySQL Full-Text Search
- 8. Performance of Full-text searches in MySQL
- 9. Comparison: MySQL vs Elasticsearch full-text search
- 10. Conclusion: MySQL Full-text Search
2. MySQL Setup
For this lesson, we make a new server which we will be using as our MySQL installation. To proceed further, we need to set up MySQL Server and Client tools on that server. We can do this by executing the following command:
Installing MySQL Server and Client
sudo apt-get install mysql-server mysql-client
Once we run this command, the mentioned utilities will be installed on the server. While installing, we also set a MySQL root password:
Once the installation process is completed, we can confirm if the MySQL server is up and running with the following command:
Check MySQL Service status
sudo service mysql status
We will see an output like this:
Now that we know that the MySQL server is up and running, we can connect to it using the username and password we supplied to it during the installation process. We can use the following command to do so:
Logging into MySQL
1 | mysql -u root -p |
When we press enter, MySQL Server will silently wait for us to provide the password. When we type it, the password will not be echoed back to the terminal due to security reasons. We will see the following prompt once we have logged into the MySQL command line:
When we are inside the MySQL command prompt, we can use the given command to show the databases which are present in the system and to ensure that MySQL is running fine:
Show All databases
1 | show databases; |
Here is what we get back with this command:
In the output, MySQL just displays the list of MySQL default databases that are used for administration purpose. As this is our fresh installation, only four databases are visible as of now which are made by MySQL itself. Now that we know that our MySQL set up is complete and running, we can start working with full-text search in MySQL but we must first understand properly what it is and how it works in MySQL.
3. Full-text Search with MySQL
If you want to find textual data similar to given values, MySQL database supports the use of LIKE operator and regular expressions as well. But these methods start to create performance bottlenecks when the amount of data grow in your database and text values grow in large number in themselves (like when we start to save complete paragraph or pages of documents in a database column). Let us point down the important drawbacks for LIKE operator and regular expression usage:
- Performance: When LIKE operator is used, MySQL performs a full-table scan to search for given values of text which is extremely slow when the amount of data is more.
- Flexibility: The queries for flexible use-cases becomes complex than they should be. Like when we want to search for posts which contain “MySQL” but not “LIKE operator”, the query is not very much flexible.
- Ranking mechanism: There is no way with which we can get a sense of strongness with the response we get with these operators as they return the results in the order they find it and not in the order of similarity in which the data has matched against the query text.
To allow overcome these limitations, MySQL has a nice feature called full-text search which has nice advantages over LIKE operator and regular expressions:
- Even complex queries start to perform very well with MySQL text search
- The index made to support full-text are extremely efficient when it comes to occupying hard disk and memory space usage
- The indexes are updated in near real-time as soon as the values change in main columns supporting full-text search
Now that we have a basic idea about what full-text search is, we can start looking at how we can start implementing and building indexes for the full-text search for our database. This is what we will be doing in the next sections.
4. Defining Full-text indexes in MySQL
In this section, we will learn about defining full-text search indexes for our table columns. Before we can perform the full-text search on our data, we must index its data first. Once the data in that column is updated, the index is recalculated for the changes made. We can define the full-text search indexes with the keyword FULLTEXT
. The good thing is that even if the column being indexed already has some data, as soon as we make an index around that column, the index is populated with complete data automatically and this isn’t a manual process.
For demonstration purposes, we have made a database jcg_schema
and some tables inside it.
There are three ways with which full-text indexes can be made.
4.1 Defining FULLTEXT index while creating a Table
We can define columns for which we need to create full-text index with the FULLTEXT
keyword as shown here:
Create Index while creating Table
CREATE TABLE table_name ( column1 data_type, column2 data_type, … more columns .. PRIMARY_KEY(key_column), FULLTEXT (column1,column2,..) );
For our database, we will create a new table Posts
with the following schema and full-text index as well:
Create Index while creating Table
CREATE TABLE Posts ( id INT PRIMARY KEY, title VARCHAR(64) NOT NULL, author_id INT NOT NULL, FULLTEXT (title) );
When we describe this table structure, we will see the following:
4.2 Defining FULLTEXT index for existing Tables
In the last section, we created a new table with a full-text index. It is possible to create a full-text index on a column in an existing column as well. For this purpose, we will use the Author
table we created earlier. For this purpose, we can use the following command syntax:
Create Index on existing Tables
ALTER TABLE table_name ADD FULLTEXT(column1,column2,..);
To illustrate this with an example, we will add the full-text index on Author
table’s name
column:
Create Index on existing Tables
ALTER TABLE Author ADD FULLTEXT(name);
Once this query is executed, we will see the following output:
4.3 Defining FULLTEXT index with Create Index statement
In the last section, we defined a full-text index on an existing column with ALTER
command. In this section, we will use an explicit CREATE INDEX
command which hae the following syntax:
Create Index explicit command
CREATE FULLTEXT INDEX index_name ON table_name(idx_column_name,...);
As clearly mentioned, this command also allows us to provide an explicit index_name
to the index as well. It makes it easier to identify the indexes which reside in our database for later queries and size identification of the index. For demonstration, here is the command we will use for the same column name
in the Author
table:
Create Index explicit command
CREATE FULLTEXT INDEX name_idx ON Author(name);
Please note that if you want to load large amount of data in a given table which has an existing FULLTEXT index present, it will be much faster if:
- Remove the
FULLTEXT
index from the column - Load the required data into the table
- Add the
FULLTEXT
index on the required column
This is necessary to do as if a FULLTEXT
index exists on a column, after every insertion, this index will be recalculated which is a synchronous process and will make insertions pretty slow. That is why the removal of the index temporarily makes this pretty fast. To remove a FULLTEXT
index from the column, we can use the following command:
Remove Index
ALTER TABLE Author DROP INDEX name_idx;
5. Introduction to Full-text searches in MySQL
We will start with understanding how natural language full-text search works in MySQL. With natural language full-text search, MySQL can extract data which is relevant to the passed query text which is in natural form, like “How many Authors have a name starting with S”. It also provides a relevance score to the returned data which means that higher the score, more it is similar to the passed query. There are various factors which affect this score, like:
- Number of words in the document
- Total number of words in the collection or table
- Unique word count in the document
- Document count which contains the given set of words
There are two functions which can perform the full-text search in MySQL, MATCH()
and AGAINST()
functions. Let’s look at how to use each of them here.
As we already have some data in the Author
table, we can use the MATCH()
and AGAINST()
functions to get a relevance score with a text query to findout if something matches to our query or not. We will use the following command:
Relevance score with MATCH() function
SELECT id, MATCH(name) AGAINST ('Oshima') as score FROM Author WHERE MATCH(name) AGAINST ('Oshima') > 0 ORDER BY score DESC;
Once we run this command, we will get the following result:
Please note that we need to repeat the MATCH()
and AGAINST()
function in the WHERE
clause because of SQL restrictions on what can be included in that clause.
We used 0 as a cutoff here. We can increase that value as much as we want to fine-tune and limit the accurate results we get back with these queries.
While this is exciting to see our results in a sorted manner of a relevance score, it is also important to note some important points:
- The minimum length of the string we can use to search is 4. This means that when we try to search a string like “car” or “see”, we will not get back any results.
- Stop words are ignored by MySQL, which are defined in the MySQL source code distribution
storage/myisam/ft_static.c
.
6. Boolean Full-text searches in MySQL
In Boolean Full-text searches in MySQL, we can also exclude words while performing a search or define a range of how far away the words in the input must be from one another. To exclude a term from the query, we can make use of the minus operator with IN BOOLEAN
keyword. Let us consider the following example where response should contain ‘module’ but not ‘getpass’:
Relevance score with MATCH() function
SELECT * FROM Posts WHERE MATCH(title) AGAINST ('module -getpass IN BOOLEAN MODE');
This is what we get back with this command:
For demonstration, we also displayed complete data which was present in the ‘Posts’ table before we perform the Boolean full-text search query.
The Boolean full-text search has some excellent features which we should know about:
- The relevance order sorting doesn’t apply on Boolean full-text search results as it did with the Match and Against functions
- If you want to perform the Boolean full-text search queries on a column, the InnoDB mandates having a FULLTEXT index on that column
- Multiple Boolean operators are not allowed in Boolean full-text search queries, MySQL will show an error if this is tried
- Boolean full-text search queries only support leading + or – operators and not the trailing operators
7. Query Expansion with MySQL Full-Text Search
It is a very common scenario when a user of an application doesn’t exactly know what he needs to search for. They can define some random keywords they have an idea about and still expect correct results from the application. This is where MySQL query expansion comes in which allows a user to use too-short keywords in their queries and still get relevant results.
It widens the search results of the full-text search result based on blind query expansion. These are steps MySQL follow to do so:
- It starts by performing a search with given short keywords itself to get highly accurate results
- It checks all search results and looks for the relevant words
- Finally, it performs a search again based on those relevant words instead of the original query passed by the user
Query expansion has simple syntax like:
Query Expansion Syntax
SELECT column1, column2 FROM table1 WHERE MATCH(column1,column2) AGAINST('keyword',WITH QUERY EXPANSION);
Let;s run our own query to see how it actually works on the name
column which already have the FULLTEXT
index enabled for it. We can run the query like:
Without Query Expansion
SELECT * FROM Posts WHERE MATCH(title) AGAINST ('module');
Here are the results we get with this query:
Let us try again with a modified query which uses query expansion:
With Query Expansion
SELECT * FROM Posts WHERE MATCH(title) AGAINST ('module' WITH QUERY EXPANSION);
Here are the results we get with this query:
Clearly the result for two queries vary and we get much more results with query expansion and often, we will get much more relevant results with query expansion. This can be used when we need to show user a list of results which can be near to his query text but not exactly the same. The results are still relevant and can be kept at a lower position when shown to the user.
8. Performance of Full-text searches in MySQL
To identify the performance of Full-text search in an application, the first thing to look for is the amount of data present in the table we want to search. If the number of records is too less and the search queries are pretty simple, using a FULLTEXT index and the Match functions will simply be an overhead. To import some main points which we have learned while working on the database search with MySQL, here we summarise these points:
- If record count for the table is below 250k, it is better when we use LIKE operator itself because it only takes a fraction of a second to return the results. We also need to take note that a view can easily take many times more to render the response.
- The FULLTEXT indexes perform better with a wide range of vocabulary as the indexes made are wide but shallow which can determine very quickly if the passed query string has any matches in the database or not.
- Interestingly, both LIKE operator and the FULLTEXT index approach scales linearly. Note that even the FULLTEXT index have limitations, like when your table reaches multi-million records, it will create pain in response times and we can move to a NoSQL solution like Elasticsearch to provide free-text search capabilities for an application.
It is important to know when to stop trying to scale a full-text search with MySQL because of its limited capabilities with performance when the data size in application grows to multi-million records, which is pretty much achievable quickly in an application for today’s world when big data is so advent. This means that moving out to a NoSQL database if you have a use case related to text search is a smart choice from the beginning itself. No doubt that MySQL search is easy to perform and has good documentation, it is necessary to pick a solution which you won’t doubt for the next 5 years of your application’s life.
9. Comparison: MySQL vs Elasticsearch full-text search
Every text search solution is as powerful as the text analysis capabilities it offers. Lucene, the underlying search engine for Elasticsearch, is such an open source information retrieval library offering many text analysis possibilities. When it comes to querying performance, the Elasticsearch performance will always outlive MySQL performance and this is clearly visible in the use cases Elasticsearch is used for.
The examples we studied in this lesson were quite simple. What if we want to cover much more complex examples with the query engine integrated into our system, like the engine:
- should look for synonyms matching my query text
- should match singular and plural words or words sounding similar to enter query text
- should not allow searching on protected words
- should allow a search for words mixed with numeric or special characters
- should not allow search on HTML tags
- should allow search text based on the proximity of the letters and number of matching letters
Enriching the content here would be to add above search capabilities to your content while indexing and searching for the content. ElasticSearch uses Lucene inbuilt capabilities of text analysis and allows you to enrich your search content. ElasticSearch offers us quite some inbuilt analyzers with preconfigured tokenizers and filters. For details list of existing analyzers, check complete list for Analysis.
With MySQL, indexes will always stay busy in indexing and searching the data for response and it will be soon when you start pitying them to making them do so much of the work.
10. Conclusion: MySQL Full-text Search
MySQL full-text search is an excellent choice (besides its disadvantages and limitations) when it comes to making sure that a user gets a fast response when he tries to perform full-text searches on the database or the application using MySQL database. It surely has some disadvantages related to scaling but for simpler applications, it is a very good starting point with one of the most popular databases used today.
Although Elasticsearch and related Lucene-based engines can provide much better full-text search engines and response times, we would suggest that you start with MySQL text search, as in a simple case it will be quick to set up and if this does not provide what you need then upgrade to elastic search. You will at least have a straw man feature which can be used to further refine your search requirements. This is exactly what this lesson aimed to do.
Thanks for the tutorial.
Good one. Thank you for the information. Keep up with good work.