I have an analytics table that stores metrics aggregated the day level. I need to ask questions such as "For a given date range and client, what are the most popular items."
The schema is as follows:
CREATE TABLE `t1` (
`imps` bigint(20) DEFAULT NULL,
`n_id` int(11) NOT NULL,
`b_id` bigint(20) NOT NULL,
`date` date NOT NULL,
`m1` bigint(20) NOT NULL,
`m2` bigint(20) NOT NULL,
`is_b` tinyint(1) NOT NULL,
`i_id` varchar(128) NOT NULL,
KEY `idx1` (`b_id`,`date`,`i_id`,`n_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
(This table is split into 256 partitions by HASH(b_id)
, in case that matters)
A representative query:
SELECT i_id, n_id, SUM(m1), SUM(m2)
FROM t1
WHERE b_id=1234 AND date>='2013-06-01' AND date<='2013-08-31'
GROUP BY i_id, n_id
ORDER BY (SUM(m1) + SUM(m2)) DESC
LIMIT 10 OFFSET 0
And the EXPLAIN
output:
+----+-------------+-------+------+---------------+------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+-------+----------------------------------------------+
| 1 | SIMPLE | t1 | ref | idx1 | idx1 | 8 | const | 1395772 | Using where; Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+-------+-------+----------------------------------------------+
On a table with ~110M records these queries take > 10s to execute. I believe the SUM function in the ORDER BY clause forces the entire resultset satisfied by the WHERE clause to be written to disk, hence the "Using temporary; Using filesort" messages.
Do I have any recourse in improving the performance of these queries short of redesigning the schema and application?
Edit: Here is a SQLFiddle with the schema and dummy data: http://sqlfiddle.com/#!2/beb63/1
EXPLAIN PARTITIONS
output. This is expected, since I'm partitioning against b_id which is used in theWHERE
clause. – Rich Schumacher Sep 27 at 19:44