Use an index when sorting

When working with a database it is a very common task to retrieve a sorted list of entities. What you might not realize is that this can be a very expensive operation.

If an index is not present on the table which the select is being used on a filesort may be performed. This is what happens in the example below using MariaDB. When performing a file sort, the database fetches every row and sorts each in memory before performing a query, which in this case is selecting the first 10 results.

Applying an index on the column you wish to sort by (ORDER BY) means that the database can skip the sort altogether. Instead of the previous scenario of fetching all rows and then sorting them, the index will allow it to just grab the first 10 entries.

The Top-N query

He hasn't just made up the term Top-N query here is a reference and explantation from Apache Flink.

A Top-N query is where you want to get the first N or last N results of table. It's a super common query to perform. But as we explained above, it can be very slow if used without an index.

I'll be using the schema displayed below for each of the sql snippets in this post. The schema represents an example database that can be found on the mariadbtutorial.com website.

nation db schema

You'll see from the above schema that each countries entity can have many country_stats entities. This means there is likely to be a lot of county_stats entities. So let's fetch the top 10 country_stats by population.

SELECT country_id, population
FROM country_stats
ORDER BY population DESC
LIMIT 10;

Running the following query with explain tells me that 9514 rows were checked and that a filesort was performed. This confirms that I wrote above, when there is no index the database has to search through each row THEN do the sort.

Let's fix this by adding an index on the country_stats table targeting the population column in descending order.

CREATE INDEX idx_population_desc ON country_stats (population DESC);

Now when the original query is run, explain tells me that only 10 records were checked and that the idx_population_desc index was used.

Method Rows Checked
Using filesort 9514
Using index 10

So when doing a Top-N query make sure to add an index on the column and specify the order you wish to sort by.

Until next time,

Brian