Why do some queries perform poorly even with seemingly correct indexes in place? On top of that, your SQL Server seems to be running hot, gobbling up disk IO and running up the CPU. The culprit might be a Key Lookup, and fortunately this can be a relatively easy problem to fix, once it is found. Let me illustrate with a simple query on a table containing 20 million rows of film ratings, with a non-clustered index on the movieId column.
SELECT movieId, rating FROM rating WHERE movieId in (1694, 1946, 6166);
This query returned just under 5500 rows, taking 2.1 seconds on my laptop and almost 100 ms of CPU time. These measurements were taken by running the following diagnostic commands in SQL Server Management Studio (before executing the query):
dbcc freeproccache dbcc dropcleanbuffers set statistics io on set statistics time on
We will run the above commands each time we test the query, ensuring that SQL Server does not reuse cached execution plans or table data in memory, as as well capture our diagnostic data. In addition, we can look at the query execution plan in Management Studio. This is done by looking for “Show Actual Execution Plan” in the toolbar. You can also get to this from menus by right-clicking in the query window or from Query in the menu bar.
The execution plan shows that 100 percent of the query cost (it’s really over 99.9%, just rounded up) comes from the Key Lookup operator. Above that is the seek operation on the movieID index, which we expect to see for this query. Since SQL Server execution plans read from right to left, the next operator is a Nested Loops join. We don’t have any joins in this query, so what exactly is going on here?
The index in this case satisfies the filter condition on movieID in the WHERE clause. This returns almost 5500 matching rows, but we also want to have the rating in the result (the SELECT portion of the statement). Since this column is not part of the index, the optimizer runs Key Lookups to retrieve it. This manifests itself as a clustered index seek, returning the entire row for every match on MovieID. That’s around 5500 clustered index seeks, adding significant IO cost to this query. In addition, the Nested Loops join which follows ties together the results of the two preceding operators. This adds CPU overhead to the query proportionate to the amount of Key Lookups that occur, that is to say every row matching our WHERE condition.
The solution is to add an Included Column to the existing nonclustered index. Ordinarily an index consists of one or more Key Columns that are used for retrieving data based on the WHERE or JOIN conditions. When the remaining columns in your query are defined as Included Columns in the same index, there is no need to perform Key Lookups to retrieve this data. This concept is known as creating a covering index on your query. In the past, creating an index to cover a query meant cramming as many columns as possible into the index key, hopefully arranged in the correct order. This resulted in overly large indexes and often difficult tuning efforts. Included Columns provide the query covering benefit in a more compact, less complex form.
Now we drop the existing nonclustered index and create a new one with the Included Column:
CREATE NONCLUSTERED INDEX [IX_rating_incl] ON [dbo].[rating] ([movieId] ASC) INCLUDE ([rating]);
Now the query execution time drops from over 2 seconds to a mere 79 ms (0.079 seconds)! Logical reads are a measure of IO cost which can cause disk and memory pressure. Before adding this index our query ran up more than 32,000 logical reads. With the new index logical reads decreased by 99.9%, to just 26! Also, the CPU overhead went away. Take a look at the new query plan:
There is a single index seek operator, assigned a cost of 0.018 by the optimizer. Formerly, the query plan had Key Lookups adding up to a cost of around 15.7. Note that query plan costs in SQL Server do not mean much as absolute values, as they are intended for relative comparison. With this purpose in mind, the costs here do clearly indicate a tremendous improvement with the new covering index.
This example illustrates the benefit of solidly understanding query tuning principles and strategy. If a query like the above executes several times per second on a busy system, improving it more than 99% is a big win. Doing so without expensive hardware purchases or lengthy refactoring of the application is an even bigger win. The challenge here is in finding these database optimization opportunities quickly and efficiently. Having access to an experienced SQL Server tuning expert puts you on the fast track. Contact me to find out how I can improve your database performance!