the borg files cache can be rather large, because it keeps some information about all files that have been processed recently.
lz4 is a very fast compression / decompression algorithm, so we could try to use it to lower the in-memory footprint of the files cache entries.
before implementing this, we should check how big the savings typically are - to determine whether it is worthwhile doing that.
the files cache dictionary maps H(fullpath) --> msgpack(fileinfo).
the msgpack algorithm already lowers the storage requirements a bit by using e.g. integers only as long as necessary.
it also serializes the python data structure (which is not necessary as we use it now, but would be also necessary for compressing it).
with compression, it could work like H(fullpath) --> compress(msgpack(fileinfo)).
but we first need some statistics about the overall size of the files cache entries with and without compression.
because msgpacking is already removing some of the redundant information, it is unclear how much compressing its output can reduce the size. of course we need to compress the cache entries individually, so the amount of data per compression call is relatively low.
note: theoretically, we could also use other combinations of serialization algorithm and compression algorithm, if they give a better overall result (compressed size and decompression speed).
The text was updated successfully, but these errors were encountered:
the borg files cache can be rather large, because it keeps some information about all files that have been processed recently.
lz4 is a very fast compression / decompression algorithm, so we could try to use it to lower the in-memory footprint of the files cache entries.
before implementing this, we should check how big the savings typically are - to determine whether it is worthwhile doing that.
the files cache dictionary maps
H(fullpath) --> msgpack(fileinfo).the msgpack algorithm already lowers the storage requirements a bit by using e.g. integers only as long as necessary.
it also serializes the python data structure (which is not necessary as we use it now, but would be also necessary for compressing it).
with compression, it could work like
H(fullpath) --> compress(msgpack(fileinfo)).but we first need some statistics about the overall size of the files cache entries with and without compression.
because msgpacking is already removing some of the redundant information, it is unclear how much compressing its output can reduce the size. of course we need to compress the cache entries individually, so the amount of data per compression call is relatively low.
note: theoretically, we could also use other combinations of serialization algorithm and compression algorithm, if they give a better overall result (compressed size and decompression speed).
The text was updated successfully, but these errors were encountered: