I have a MyISAM table with billions of rows that is still causing me problems because the indexes, even after I shortened them as much as possible, do not fit in my 64GB of RAM.
I am unable to index the table and this is causing serious problems.
I thought that partitioning might be the answer to the problem, but I'm not sure because I SELECT
on all kinds of different column combinations.
The table structure is:
CREATE TABLE research_storage1b (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`word1` mediumint(8) unsigned NOT NULL,
`word2` mediumint(8) unsigned NOT NULL,
`origyear` smallint(5) unsigned NOT NULL,
`cat` tinyint(3) unsigned NOT NULL,
`pibn` int(10) unsigned NOT NULL,
`page` smallint(5) unsigned NOT NULL,
`pos` smallint(5) unsigned NOT NULL,
`num` tinyint(3) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `pibnpage` (`pibn`,`page`),
KEY `word21pibn` (`word2`,`word1`,`pibn`),
KEY `word12num` (`word1`,`word2`,`num`),
KEY `year1` (`origyear`,`word1`),
KEY `catyear` (`cat`,`origyear`),
KEY `pibn` (`pibn`,`word1`)
) ENGINE=MyISAM DEFAULT CHARSET=ascii COLLATE=ascii_bin
DATA DIRECTORY='/storage/researchdb'
INDEX DIRECTORY='/storage/researchdb';
The indexes are representative of the different SELECT
statements, i.e. sometimes I'm retrieving pibn
based on the word1
, and sometimes I'm retrieving word1
based on pibn
.
If I had to choose something to split the table on, it would probably be word1
because this is most searched for. But there are some queries that will be selecting word1
, or counting word1
, in which case would I be correct in assuming the partitioning would drastically slow those queries down? There are around 500,000 unique values for word1
.
My main questions are:
- Should I use partitioning (will it solve my indexing issue)?
- Should I partition on
word1
? - How many partitions should I have? (I'm thinking maybe 100?)
- How do I modify the
CREATE TABLE
to have 100 (or however many) partitions split onword1
(or another column)? - Since
word1
is not a primary key... do I have to change the primary key to an index, or something like that? (I didn't fully understand this part of the partitioning docs/tutorials.)
Please help if you can, thank you.