How the MySQL Optimizer fails with WHERE + ORDER BY + LIMIT

April 8, 2019   

Given a table that looks like:

CREATE TABLE `widgets` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `warehouse_id` bigint(20) DEFAULT NULL,
  `name` bigint(20) DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `count` int(11) DEFAULT NULL,
  `shipping_price` decimal(21,3) DEFAULT '0.000',
  PRIMARY KEY (`id`),
  KEY `index_widgets_on_updated_at` (`updated_at`),
  KEY `index_orders_on_warehouse_id_and_updated_at` (`warehouse_id`,`updated_at`)
) ENGINE=InnoDB AUTO_INCREMENT=955002585213 DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC

If the table contains many wharehouses without and even distribution of records the following query will likely preform poorly:

SELECT widgets.updated_at, widgets.warehouse_id
FROM widgets
WHERE widgets.updated_at >= '2017-08-10 01:31:06'
AND widgets.warehouse_id = 123
ORDER BY widgets.updated_at
LIMIT 10

WHY?

MySQL’s optimizer doesn’t understand that warehouses may be “clumpy” in the table. It assumes an even distribution of warehouse ids in the table so it uses its heuristic based optimization to and calculates the cost of doing the WHERE + ORDER BY using the index, vs. a full scan, and it is thinking the whole table will contain many records for warehouse_id 123 for WHERE widgets.updated_at >= '2017-08-10 01:31:06' (which is also reasonable) and thus the index scan is preferred.[1]

Work Arounds:

  • Add a FORCE INDEX (index_orders_on_warehouse_id_and_updated_at) , drawback force index clutter may surface as a problem later. say if the index is removed.
  • Drop the ORDER BY and hope mysql will likely choose index_orders_on_warehouse_id_and_updated_at. Since updated_at is rightmost in the index & we’re selecting for a specific warehouse_id MySQL will implicity order the result set by it. But if mysql chooses a different index the query will break

[1] 100% stolen from slack, credit Jeremey Cole