Google is working on a new open source compression algorithm
Share on Twitter.
Get the most reliable SMTP service for your business. You wished you got it sooner!
September 23, 2015
Earlier this morning, Google said it is working on a new open source compression algorithm dubbed the Brotli algo,
with the hope that it can eventually mean the end of the venerable Deflate algorithm.
However, before trying to understand the Deflate system, it's important to understand the
other two compression strategies that make it up-- Huffman coding and LZ77 compression.
Huffman coding is a form of prefix coding. You've almost certainly used a prefix code when
using the phone, for example.
Starting from the dial tone, you press a sequence of what may be five, seven, eight, eleven,
twelve, or some other number of keys, and each sequence of keys reaches another specific phone
Now suppose you're in an office setting with an internal switchboard, as many large companies do.
All other phones only require five numbers to dial, instead of seven -- that's because it's expected
that you'll be calling those numbers more often. You may still need to call other numbers, though
so all of those numbers have a `9' added to the front.
For its part, LZ77 compression works by finding sequences of data that are repeated. The term "sliding
window" is used. All it really means is that at any given point in the data, there is a record of what
characters went before.
A 32K sliding window means that the compressor (and decompressor) have a record of what the last
32768 (32 * 1024) characters were. When the next sequence of characters to be compressed is identical
to one that can be found within the sliding window, the sequence of characters is replaced by two
numbers-- a distance, representing how far back into the window the sequence starts, and a length,
representing the number of characters for which the sequence is identical.
In a whitepaper, Google collaborators Jyrki Alakuijala, Evgenii Kliuchnikov, and Lode Vandevenne
also explain that the Brotli algorithm adds a static dictionary to the compression algorithm.
“It contains 13,504 words or syllables of English, Spanish, Chinese, Hindi, Russian, and Arabic, as
size of the static dictionary is 122,784 bytes.
The static dictionary is extended by a mechanism of transforms that slightly change the words in
the dictionary. A total of 1,633,984 sequences, although not all of them unique, can be constructed
by using the 121 transforms.”
In tests under Linux 3.13.0 on an Intel Xeon Eg-1650 v2 server running at 3.5 GHz, the researchers
claimed Brotli running at a 3.381:1 compression ratio could compress at 98.3 MB/s and decompress at
The fastest Deflate could run at a low-quality setting of 2.913:1 compression was 93.5 MB/s compression
and 323 MB/s decompression.
Google notes that better compression would particularly benefit mobile users, who in addition to
faster page loads would get savings on their data charges and hopefully lower battery consumption.
We need to note that Deflate's endurance demonstrates just how effective the original work was in 1996
and still is today. It's described in RFC 1951.
Get the most dependable SMTP server for your company.
Share on Twitter.