Foreword
We all know that using "%xx" in InnoDB for fuzzy data queries can cause index invalidation, but sometimes requirements are just like that. There are many similar needs. For example, search engines need to perform full-text searches based on keywords from user data, and e-commerce websites may need to search within detailed product descriptions based on user query conditions. These tasks are not well handled by B+ tree indexes.
Most of the queries we need can be accomplished through value comparison, range filtering, etc. However, if we need to filter queries by keyword matching, similarity-based queries are required instead of exact value comparisons. Full-text search is designed for such scenarios.
Full-Text Search is a technique for finding any information stored in a whole book or entire article within a database. It can retrieve information about chapters, sections, paragraphs, sentences, words, etc., from the full text as needed, and can perform various statistics and analyses.
In early MySQL, InnoDB did not support full-text search technology. Starting from MySQL 5.6, InnoDB began supporting full-text search.
Inverted Index
Full-text search is typically implemented using an inverted index. Like the B+Tree, an inverted index is also an index structure. It stores a mapping between words and their positions within one or more documents in an auxiliary table. This is usually implemented using an associative array and has two forms:
- inverted file index: {word, document id where the word appears}
- full inverted index: {word, (document id, position within the document)}

The above image shows the associative array of the inverted file index. The word "code" appears in documents 1 and 4. Stored this way, full-text queries become simple—you can directly get the documents containing the query keyword based on Documents. The full inverted index stores pairs (DocumentId, Position), so the inverted index it stores is shown in the figure below. For example, the keyword "code" appears as the 6th word in document 1 and the 8th word in document 4. Comparatively, the full inverted index takes up more space but allows better data positioning and supports additional search features.

Full-Text Search
Creating a Full-Text Index
- Create a full-text index when creating a table:
CREATE TABLE table_name ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, author VARCHAR(200),
title VARCHAR(200), content TEXT(500), FULLTEXT full_index_name (col_name) ) ENGINE=InnoDB;
Enter the query statement:
SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_TABLES
WHERE name LIKE 'test/%';

The six index tables above form the inverted index and are called auxiliary index tables. When an incoming document is tokenized, individual words, along with position information and the associated DOC_ID, are fully sorted and partitioned across the six index tables based on the sorting weight of the first character's character set.
- Create a full-text index on an existing table:
CREATE FULLTEXT INDEX full_index_name ON table_name(col_name);
Using a Full-Text Index
MySQL supports full-text search queries. Full-text indexes can only be used on InnoDB or MyISAM tables and only on columns of type char, varchar, or text.
The syntax is as follows:
MATCH(col1,col2,...) AGAINST(expr[search_modifier])
search_modifier:
{
IN NATURAL LANGUAGE MODE
| IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
| IN BOOLEAN MODE
| WITH QUERY EXPANSION
}
Full-text search uses the MATCH() AGAINST() syntax. MATCH() takes a comma-separated list of columns to search. AGAINST() takes a search string and an optional modifier for the type of search. Full-text search comes in three types: natural language search, boolean search, and query expansion search. The various query modes are introduced below.
Natural Language
Natural language search interprets the search string as a phrase in natural human language. MATCH() defaults to Natural Language mode, which means querying for documents containing the specified keyword.
Let's combine a demo to better understand Natural Language:
SELECT
count(*) AS count
FROM
`fts_articles`
WHERE
MATCH ( title, body ) AGAINST ( 'MySQL' );

The above statement queries the number of rows in the title and body columns that contain the 'MySQL' keyword. The above statement can also be written as:
SELECT
count(IF(MATCH ( title, body )
against ( 'MySQL' ), 1, NULL )) AS count
FROM
`fts_articles`;
Although both statements yield the same result, the second SQL statement runs faster internally because the first SQL (query based on WHERE index) also needs to sort and count relevance, while the second does not.
You can also query relevance via SQL:
SELECT
*,
MATCH ( title, body ) against ( 'MySQL' ) AS Relevance
FROM
fts_articles;

Relevance calculation is based on the following four conditions:
- Whether the word appears in the document
- The number of times the word appears in the document
- The number of words in the indexed column
- How many documents contain the word
For InnoDB storage engine full-text search, the following factors also need to be considered:
- If the queried word is in the stopword list, the search for that string is ignored
- Whether the character length of the queried word falls within the interval [innodb_ft_min_token_size, innodb_ft_max_token_size]
If a word is in stopwords, it is not searched. For example, querying for the word 'for' yields the following result:
SELECT
*,
MATCH ( title, body ) against ( 'for' ) AS Relevance
FROM
fts_articles;

As seen, although 'for' appears in documents 2 and 4, its relevance is 0 because it is a stopword.
The parameters innodb_ft_min_token_size and innodb_ft_max_token_size control the length of characters queried by the InnoDB engine. When the length is less than innodb_ft_min_token_size or greater than innodb_ft_max_token_size, the search for that word is ignored. In the InnoDB engine, the default value of innodb_ft_min_token_size is 3, and the default value of innodb_ft_max_token_size is 84.
Boolean
Boolean search interprets the search string using the rules of a special query language. The string contains words to search for and can also include operators that specify requirements, such as whether a word must exist or must not exist in matching rows, or whether its weight should be higher or lower than usual. For example, the following statement requires documents that have the string "Pease" but not "hot", where + and - indicate that the word must exist or must not exist, respectively.
select * from fts_test where MATCH(content) AGAINST('+Pease -hot' IN BOOLEAN MODE);
Boolean full-text search supports the following types:
- +: Indicates the word must be present
- -: Indicates the word must not be present
- (no operator): Indicates the word is optional, but if present, its relevance is higher
- @distance: Indicates that the distance between multiple queried words must be within distance (in bytes). This type of full-text search is also called
Proximity Search, e.g.,MATCH(context) AGAINST('"Pease hot"@30' IN BOOLEAN MODE)means the distance between the strings Pease and hot must be within 30 bytes - >: Indicates that the word increases relevance when present
- <: Indicates that the word decreases relevance when present
- ~: Indicates that the word is allowed, but its relevance is negative when present
- *: Indicates words starting with the given prefix, e.g.,
lik*can meanlik,like,likes - ": Indicates a phrase
Below are some demos showing how Boolean Mode is used.
demo1: + -
SELECT
*
FROM
`fts_articles`
WHERE
MATCH ( title, body ) AGAINST ( '+MySQL -YourSQL' IN BOOLEAN MODE );
The above statement queries information that includes 'MySQL' but does not include 'YourSQL'.

demo2: no operator
SELECT
*
FROM
`fts_articles`
WHERE
MATCH ( title, body ) AGAINST ( 'MySQL IBM' IN BOOLEAN MODE );
The above statement queries for 'MySQL IBM' without '+' or '-' identifiers, meaning the words are optional, and if present, their relevance is higher.

demo3: @
SELECT
*
FROM
`fts_articles`
WHERE
MATCH ( title, body ) AGAINST ( '"DB2 IBM"@3' IN BOOLEAN MODE );
-- Copy code
The above statement means the distance between the words "DB2" and "IBM" is within 3 bytes.
demo4: > <
SELECT
*
FROM
`fts_articles`
WHERE
MATCH ( title, body ) AGAINST ( '+MySQL +(>database <DBMS)' IN BOOLEAN MODE );
The above statement queries rows containing 'MySQL', 'database', and 'DBMS' simultaneously, but rows without 'DBMS' have higher relevance than those containing 'DBMS'.

demo5: ~
SELECT
*
FROM
`fts_articles`
WHERE
MATCH ( title, body ) AGAINST ( 'MySQL ~database' IN BOOLEAN MODE );
The above statement queries rows containing 'MySQL', but if the row also contains 'database', the relevance is lowered.

demo6: *
SELECT
*
FROM
`fts_articles`
WHERE
MATCH ( title, body ) AGAINST ( 'My*' IN BOOLEAN MODE );
The above statement queries rows whose keywords contain 'My'.

demo7: "
SELECT
*
FROM
`fts_articles`
WHERE
MATCH ( title, body ) AGAINST ( '"MySQL Security"' IN BOOLEAN MODE );
The above statement queries rows containing the exact phrase 'MySQL Security'.

Query Expansion
Query expansion search is a modification of natural language search. This query is typically used when the query keyword is too short and the user needs implied knowledge. For example, when querying for the word database, users may expect not only documents containing database but also those containing words like MySQL, Oracle, RDBMS. In such cases, the Query Expansion mode can be used to enable implied knowledge of full-text search.
Adding WITH QUERY EXPANSION / IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION to the query statement enables blind query expansion (also called automatic relevance feedback). This query has two phases.
- Phase 1: Perform a full-text index query based on the searched word.
- Phase 2: Perform another full-text search query based on the tokens generated in Phase 1.
Let's look at an example to see how Query Expansion is used.
-- Create index
create FULLTEXT INDEX title_body_index on fts_articles(title,body);
-- Query using Natural Language mode
SELECT
*
FROM
`fts_articles`
WHERE
MATCH(title,body) AGAINST('database');
Results before using Query Expansion:

-- When using Query Expansion mode
SELECT
*
FROM
`fts_articles`
WHERE
MATCH(title,body) AGAINST('database' WITH QUERY expansion);
Results after using Query Expansion:

Since Query Expansion full-text search may bring many irrelevant results, users need to be very cautious when using it.
Deleting a Full-Text Index
- Directly delete the full-text index:
DROP INDEX full_idx_name ON db_name.table_name;
- Delete the full-text index using ALTER TABLE:
ALTER TABLE db_name.table_name DROP INDEX full_idx_name;
Summary
This article introduced fulltext index from both theoretical and practical perspectives. If you are interested in MySQL, please continue to follow the MySQL column.