DivANS: Modular compression algorithm

For this experiment, to see how well DivANS performs on uncompressed data. We sampled 130,000 random chunks being uploaded to Dropbox. We measured the sizes and compression (encode) and decompression (decode) rates of DivANS, and several alternative algorithms (zlib, Brotli, Zstd, 7zip, bz2). For DivANS and Brotli, we compared 2 different compression levels (q9 and q11). For zlib, 7zip, bz2 and Zstd, we used maximum settings. All tests were performed on 2.6 GHz Intel Xeon E5 2650v2 servers in the Dropbox datacenters and no data chunks from the test were persisted. Since Dropbox uses Lepton to compress image files and some files are already compressed by users, we only benchmark files where zlib is able to obtain at least 1% savings on that files. One third of the sampled files, by space, were benchmarked.

Measuring the Dropbox dataset, DivANS q11 saves 2.5% over Brotli q11 and 12% over zlib.

At these settings, DivANS and Brotli have similar encoding speeds. However, DivANS decode times are still five times slower than Brotli. This is because Brotli decompresses bytes using just a pair of table lookups and some bookkeeping, whereas DivANS has to maintain probability models for each nibble decoded and uses several vector instructions to maintain the CDFs and the ANS state. We expect that a few straightforward optimizations to improve the bookkeeping will bring the speed up a bit, to more than 200Mbps.