Are there any well-known libraries in Java for sparse bit vectors?
(And are there guidelines for how sparse is useful to use them vs. java.util.BitSet?)
Are there any well-known libraries in Java for sparse bit vectors? (And are there guidelines for how sparse is useful to use them vs. java.util.BitSet?) |
|||||||||||||
|
The colt library has sparse matrices (1D, 2D and 3D). It also has an efficient BitVector, with 1 bit per value, rather than 8-bits as However, the sparse matrices do not support bits directly - only doubles and objects. You could wrap the 1D sparse double matrix by maping bit index to long indices EDIT: More info. The Colt vectors/matrices require no memory initially for storage, assuming all bits (longs) are initially 0. Setting a value to non-zero consumes memory. Setting the value back to 0 continues to consume memory, although memory for zero values is reclaimed periodically. If the bits are truly sparse, such that each backing long value only has one bit set, then the storage overhead will be very poor, requiring 64-bits per actual bit stored. But as you mention typical case is 20-40% sparse, then the overhead will be much lower, with possibly no wasted storage if bits are clustered in ranges, e.g. bits from 0-100, then 1000-1100, and 2000-2200 (values in hex.) Overall, only 1/16 of the region is assigned to bits, but the clustering means that the bits are stored with no wasted space. | |||||||||
|
If its really sparse (e.g., less than 1% loading), then using a hash table indexed by bit index is probably pretty good; mere presence or absence of the index in the table is all you need to know if the bit is one or zero respectively. If the density is upwards of a few percent, you can use a hash table indexed by bit index divided by 64, and store long words in the hash table containing actual bits. Bit N is set if the hash table contains value V for int(N/64) and (V>>(N mod 64))&1 is true. Both of these answers assume you want to optimize random access to bits. If you want to optimize sequential (or other access) to bits by index, then you may want a sparse matrix structure, using the same kind of low-level bit vector representation depending on expected density. See Sparse Matrices | |||
|
You could try FastUtil's AVL Tree Map. | |||||
|
TL;DR go here Efficient Sparse BitSet implementation in Java I know this is an "old" question, but having the same question I stumbled across this post. While the answers are good, I was ultimately not satisfied. After further digging, I think I've come across the "definitive" answer to the question of sparse BitSets in Java. UPDATE: the above links are dead, but the code and presentations have been preserved here: https://github.com/brettwooldridge/SparseBitSet OLD LINKS: I cannot recommend reading this presentation more highly: Overview: Is it Computer Science, Software Engineering, or Hacking? Slides: Is it Computer Science, Software Engineering, or Hacking? In this presentation the author, Dr. Bruce Haddon, discusses the efforts of his researchers to create a highly memory-efficient and high-performance replacement for the standard Java BitSet. | |||||||||
|
CERN COLT is widely used for vector and matrix computation, and has sparse matrices, but isn't specifically used for bit vectors. http://acs.lbl.gov/software/colt/api/cern/colt/matrix/impl/SparseObjectMatrix1D.html | |||
|
You could try the JavaEWAH library. https://code.google.com/p/javaewah/ Depending on your problem it may be a good fit. (It is used by Apache Hive and others.) | |||
|