Lately, I've been reading up on the Common Language Infrastructure's data formats. As a result, one thing it mentioned is some of the data is stored in a BitVector.
I hadn't ever used a BitVector before, so I decided to do some reading. A bit vector is basically an array of bits which allows you to greatly reduce the necessary space to store something, especially if certain elements are require few bits to represent.
After some banging on the keyboard (and reading), I had the following code:
BitVector.cs (As html)
Note to moderators: I had to post it to my own FTP account because it was too large for the submission, it exceeds 30000 characters and is somewhere around 51000.
The question I have for you ladies and gents is: what kind of performance can I expect out of the code? I've written something similar previously for working with unicode character sets, creating unions, intersections, exclusive disjunctions, and so on for non-deterministic and deterministic state machines (which use regex like patterns that require large bitfields for the unicode character set, et cetera). Previously the focus was on creating the smallest possible footprint, which yielded an offset and the smallest set of data possible.
The goal this time is to construct a bit field which is capable of reading existing data (non reduced), being able to arbitrarily read integers of varying sizes from this data.
The TrueCount()
method involves using a single byte for each possible variation of a ushort
(16-bit integer.) The table itself is stored in a small file (873 bytes worth) within the Resources
of the program. If anyone wants the file I can post it, but it's pretty simple to calculate.
BitArray
,BitVector32
orBinaryReader
? – svick Mar 14 '12 at 12:06