I have a table containing servers logs, and another table containing rules to match certain and drop them.
The logs tables contains around 1 million rows, and there are about 20-30 rules.
The query runs very slowly, I wonder is there any way I can make it run faster. I tried adding indexes to logs.message, but it does not help, I also read that you cannot index a "LIKE" column.
I am a total newbie to database, so please forgive me if I am missing any important concepts. Thanks in advance.
CREATE TABLE `logs` (
`log_id` bigint(20) NOT NULL AUTO_INCREMENT,
`criticality` varchar(255) NOT NULL,
`hostname` varchar(255) NOT NULL,
`source` varchar(255) NOT NULL,
`message` varchar(4096) NOT NULL,
`record_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`record_by` varchar(255) NOT NULL,
PRIMARY KEY (`log_id`),
KEY `idx_message` (`message`(255))
) ENGINE=InnoDB AUTO_INCREMENT=233523 DEFAULT CHARSET=utf8$$
CREATE TABLE `rules` (
`rule_id` int(11) NOT NULL AUTO_INCREMENT,
`content` varchar(255) NOT NULL,
`type` enum('MATCH','DROP') NOT NULL DEFAULT 'DROP',
PRIMARY KEY (`rule_id`)
) ENGINE=InnoDB AUTO_INCREMENT=31 DEFAULT CHARSET=utf8$$
select
hostname, criticality, source, message, record_date
from
eventlog.logs l1
where
not exists( SELECT
l.message, r.rule_id
FROM
eventlog.logs l,
eventlog.rules r
where
l.message like r.content
and l.log_id = l1.log_id
and r.type = 'DROP')
and (criticality = 'High'
or criticality = 'Medium')
and record_date > sysdate() - Interval 2 Day
order by l1.message;
UPDATE 1 explain results, it took around 10 seconds to finish the query.
+----+--------------------+-------+--------+---------------+---------+---------+--------------------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+---------------+---------+---------+--------------------+---------+-----------------------------+
| 1 | PRIMARY | l1 | ALL | NULL | NULL | NULL | NULL | 2101642 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | r | ALL | NULL | NULL | NULL | NULL | 16 | Using where |
| 2 | DEPENDENT SUBQUERY | l | eq_ref | PRIMARY | PRIMARY | 8 | eventlog.l1.log_id | 1 | Using where |
+----+--------------------+-------+--------+---------------+---------+---------+--------------------+---------+-----------------------------+
eventlog.logs
in the subquery? You link the subquery using the primary key so even if you remove it, you'll have same results. – ypercube Jan 26 at 22:15