How the MySQL Optimizer fails with WHERE + ORDER BY + LIMIT
April 8, 2019
Given a table that looks like:
CREATE TABLE `widgets` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`warehouse_id` bigint(20) DEFAULT NULL,
`name` bigint(20) DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`count` int(11) DEFAULT NULL,
`shipping_price` decimal(21,3) DEFAULT '0.000',
PRIMARY KEY (`id`),
KEY `index_widgets_on_updated_at` (`updated_at`),
KEY `index_orders_on_warehouse_id_and_updated_at` (`warehouse_id`,`updated_at`)
) ENGINE=InnoDB AUTO_INCREMENT=955002585213 DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC
If the table contains many wharehouses without and even distribution of records the following query will likely preform poorly:
SELECT widgets.updated_at, widgets.warehouse_id
FROM widgets
WHERE widgets.updated_at >= '2017-08-10 01:31:06'
AND widgets.warehouse_id = 123
ORDER BY widgets.updated_at
LIMIT 10
WHY?
MySQL’s optimizer doesn’t understand that warehouses may be “clumpy” in the table. It assumes an even
distribution of warehouse ids in the table so it uses its heuristic based optimization to
and calculates the cost of doing the WHERE + ORDER BY using the index, vs. a full scan,
and it is thinking the whole table will contain many records for warehouse_id 123 for
WHERE widgets.updated_at >= '2017-08-10 01:31:06' (which is also reasonable) and thus the index
scan is preferred.[1]
Work Arounds:
- Add a FORCE INDEX (
index_orders_on_warehouse_id_and_updated_at) , drawback force index clutter may surface as a problem later. say if the index is removed. - Drop the
ORDER BYand hope mysql will likely chooseindex_orders_on_warehouse_id_and_updated_at. Since updated_at is rightmost in the index & we’re selecting for a specific warehouse_id MySQL will implicity order the result set by it. But if mysql chooses a different index the query will break
[1] 100% stolen from slack, credit Jeremey Cole