Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have the following MySQL (MyISAM) table with about 3 Million rows.

CREATE TABLE `tasks` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `node` smallint(6) NOT NULL,
  `pid` int(11) NOT NULL,
  `job` int(11) NOT NULL,
  `a_id` int(11) DEFAULT NULL,
  `user_id` int(11) NOT NULL,
  `state` int(11) NOT NULL,
  `start_time` int(11) NOT NULL,
  `end_time` int(11) NOT NULL,
  `stop_time` int(11) NOT NULL,
  `end_stream` int(11) NOT NULL,
  `message` varchar(255) DEFAULT NULL,
  `rate` float NOT NULL,
  `exiting` int(11) NOT NULL DEFAULT '0',
  `bytes` int(11) NOT NULL,
  `motion` tinyint(4) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `a_id` (`a_id`),
  KEY `job` (`job`),
  KEY `state` (`state`),
  KEY `end_time` (`end_time`),
  KEY `start_time` (`start_time`),
) ENGINE=MyISAM AUTO_INCREMENT=100 DEFAULT CHARSET=utf8;

Now when I run the following query, MySQL is only using the a_id index and needs to scan a few thousand rows.

SELECT count(id) AS tries FROM `tasks` WHERE ( job='1' OR job='3' ) 
AND a_id='614' AND state >'80' AND state < '100' AND start_time >='1386538013';

When I add an additional index KEY newkey (a_id,state,start_time), MySQL is still trying to use a_id only and not newkey. Only when using the hint / force index in the query, it's been used. Changing the fields in the query around does not help.

Any ideas? I don't necessarily want hints in my statements. The fact that MySQL is not doing this automatically indicates to me that there is an issue with my table, keys or query somewhere. Any help is highly appreciated.

Additional info:

mysql> show index in tasks;
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tasks |          0 | PRIMARY   |            1 | id          | A         |     3130554 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | a_id      |            1 | a_id        | A         |        2992 |     NULL | NULL   | YES  | BTREE      |         |               |
| tasks |          1 | job       |            1 | job         | A         |           5 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | state     |            1 | state       | A         |           9 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | end_time  |            1 | end_time    | A         |     1565277 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | newkey    |            1 | a_id        | A         |        2992 |     NULL | NULL   | YES  | BTREE      |         |               |
| tasks |          1 | newkey    |            2 | state       | A         |        8506 |     NULL | NULL   |      | BTREE      |         |               |
| tasks |          1 | newkey    |            3 | start_time  | A         |     3130554 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

EXPLAIN with and without quotes:

mysql> DESCRIBE SELECT count(id) AS tries FROM `tasks` WHERE ( job='1' OR job='3' )  AND a_id='614' AND state >'80' AND state < '100' AND start_time >='1386538013';
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys              | key       | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
|  1 | SIMPLE      | tasks | ref  | a_id,job,state,newkey      | a_id      | 5       | const |  740 | Using where |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
1 row in set (0.10 sec)

mysql> DESCRIBE SELECT count(id) AS tries FROM `tasks` WHERE ( job=1 OR job=3 )  AND a_id = 614 AND state > 80 AND state < 100 AND start_time >= 1386538013;
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys              | key       | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
|  1 | SIMPLE      | tasks | ref  | a_id,job,state,newkey      | a_id      | 5       | const |  740 | Using where |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
1 row in set (0.01 sec)
share|improve this question
 
Why do you use static numbers as strings in your select? That might be throwing off the query optimizer. –  acfrancis yesterday
 
Thanks for letting me know, not sure if this makes much difference (not for explain), but will try to avoid unnecessary quotes. –  user2785818 yesterday
 
I think the issue is that the job column isn't in either index. Which ever index it uses, it will still have to refer to the full table to satisfy that part of the where clause. That row lookup probably dominates the IO so that the choice of index doesn't matter. –  acfrancis yesterday
add comment

3 Answers

up vote 1 down vote accepted

A few things... I would have a SINGLE compound index on ( a_id, job, state, start_time )

This to help optimize the query on all the criteria, in what I believe is the best tuned sequence. A single "A_ID", then two jobs, a small state range, then time based. Next, notice no quotes... It appears you were converting numeric to string comparisons, leave them as numeric for compare -- faster than strings.

Also, by having them all as part of the index, it is a COVERING index meaning it does NOT have to go to the raw page data to get the other values to test the qualifying records to include or not.

SELECT 
      count(*) AS tries 
   FROM 
      tasks
   WHERE 
          a_id = 614
      AND job IN ( 1, 3 ) 
      AND state > 80 AND state < 100 
      AND start_time >= 1386538013;

Now, the why the index... consider the following scenario. You have two rooms that have boxes... In the first room, each box is an "a_id", within that are the jobs in order, within each job are the state ranges, and finally by start time.

In another room, your boxes are sorted by start time, within that a_id are sorted, and finally state.

Which would be easier to find what you need. That is how you should think on the indexes. I would rather go to one box for "A_ID = 614", then jump to Job 1 and another for Job 3. Within each Job 1, Job 3, grab 80-100, then time. You however know better your data and volume in each criteria consideration and may adjust.

Finally, the count(ID) vs count(*). All I care about is a record qualified. I don't need to know the actual ID as the filtering criteria already qualified as include or not, why look (in this case) for the actual "ID".

share|improve this answer
 
Thanks DRapp, this all makes sense - doing this makes MySQL use the newkey instead of just the a_id. The star makes sense, and so does the IN clause. Thanks for this, happy now. –  user2785818 yesterday
 
@user2785818, you are welcome, and as a newbie, typically when answers are offered, you can up-vote them if they are meaningful/useful, and select the CHECKMARK next to the answer so others also know it has been solved. –  DRapp yesterday
 
Thanks for letting me know - still need 4 more experience points for voting though :) –  user2785818 yesterday
 
Now only two more ... –  user2785818 yesterday
 
+1 on behalf of OP –  InoS Heo yesterday
add comment

Probably mysql thinks that using the a_id key will using less IO. Probably the cardinality of the key a_id is good enough. What explains of the hinted/hintless queries say?

share|improve this answer
 
have added the describe statements and index info in the original post –  user2785818 yesterday
add comment

Most of a_id=614's state has > 80 and < 100, then it could be happened. Have you tried one of below indexes?

  • INDEX(a_id, start_time, state)
  • INDEX(start_time, a_id, state)
share|improve this answer
 
Thanks, most states are between 80 and 100. Yes, selecting the right index is the key. –  user2785818 yesterday
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.