Tell me more ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

My query is this:

UPDATE `phrases`
SET `phrases`.`count`=(SELECT COUNT(*) FROM `strings` WHERE `string` LIKE CONCAT('%', `phrases`.`phrase`, '%'))

My tables look like this:

CREATE TABLE `phrases` (
    `hash` varchar(32) NOT NULL,
    `count` int DEFAULT 0,
    `phrase` text NOT NULL,
     PRIMARY KEY (`hash`),
     KEY(`count`)
)

And

CREATE TABLE `strings` (
    `string` text NOT NULL,
)

phrases has 18,000 rows. strings has 1500 rows.

share|improve this question
2  
So the query works as is? Then this should be on codereview.SE. – Kevin Dec 20 '11 at 21:58
It might be more efficient to have a separate table where you would store the counts per phrase, and then only update this table once a new string is added. Since the number of strings is low in comparison to the phrases, I figure this wont happen that often. So you would not perform the whole count again, just add 1 if the new string matches that phrase. – saratis Dec 20 '11 at 22:09

migrated from stackoverflow.com Dec 22 '11 at 5:18

2 Answers

up vote 4 down vote accepted

Since you're using a LIKE with wildcards, you're going to do a table-scan against both tables, running a total of 18000*1500 = 27000000 substring comparisons.

To optimize this, you need to use some fulltext index technology. I suggest Sphinx Search or Apache Solr. If you do this, you don't need to keep a count of how many matches there are, because the search index makes it a lot less expensive to get a count on demand.

MySQL also implements a FULLTEXT index type, but it is only supported in the MyISAM storage engine in current versions (up to 5.5). I don't recommend using MyISAM for important data.

MySQL 5.6 is developing a fulltext index for InnoDB.

share|improve this answer
+1 because this answer just makes more sense than any other solution. – RolandoMySQLDBA Dec 20 '11 at 22:27

You should drop the index and collect the counts.

This will speed up the updating of the count column.

When done, put the index back.

ALTER TABLE phrase DROP INDEX `count`;
UPDATE phrase SET COUNT=0;
UPDATE phrases INNER JOIN string
ON ( LOCATE(strings.string,phrases.phrase) > 0 )
SET phrase.`count`=phrase.`count`+1;
ALTER TABLE phrase ADD INDEX `count` (`count`);

This INNER JOIN is nothing more than a Cartesian Product (pointed out by Bill Karwin's answer as 27,000,000 rows being examined in a temp table).

If the time to process is something can live with, all well and good.

If the time to process is disastrously slow, you must try Bill Karwin's answer.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.