DivANS is an open source generic data compression algorithm implemented in native
safe/no-stdlib Rust and compiled to WASM for this page.
It uses SIMD
computations, threading and allows for a pluggable memory allocator.
algorithm favors compression
ratio over performance, operating at line speeds of 150 Mbit/s.
The name originates from "divided-ANS" since the intermediate
representation is divided from the ANS
the blog post and analysis here.
For this experiment, to see how well DivANS performs on uncompressed data.
We sampled 130,000 random chunks being uploaded to Dropbox.
We measured the sizes and compression (encode) and decompression (decode) rates of DivANS, and several alternative algorithms (zlib, Brotli, Zstd, 7zip, bz2).
For DivANS and Brotli, we compared 2 different compression levels (q9 and q11). For zlib, 7zip, bz2 and Zstd, we used maximum settings.
All tests were performed on 2.6 GHz Intel Xeon E5 2650v2 servers in the Dropbox datacenters and no data chunks from the test were persisted.
Since Dropbox uses Lepton to compress image files and some files are already compressed by users, we only benchmark files where zlib is able to obtain
at least 1% savings on that files. One third of the sampled files, by space, were benchmarked.
Measuring the Dropbox dataset, DivANS q11 saves 2.5% over Brotli q11 and 12% over zlib.
At these settings, DivANS and Brotli have similar encoding speeds. However, DivANS decode times are still five times slower than Brotli. This is because Brotli decompresses bytes using just a pair of table lookups and some bookkeeping, whereas DivANS has to maintain probability models for each nibble decoded and uses several vector instructions to maintain the CDFs and the ANS state. We expect that a few straightforward optimizations to improve the bookkeeping will bring the speed up a bit, to more than 200Mbps.