Get the lowest-cost and the best server colocation service in the business. Learn more.
Information Technology News.

Google is working on a new open source compression algorithm

Share on Twitter.

Get the most reliable SMTP service for your business. You wished you got it sooner!

Click here to order the best deal on a HP enterprise dedicated server and at a great price.

September 23, 2015

Earlier this morning, Google said it is working on a new open source compression algorithm dubbed the Brotli algo, with the hope that it can eventually mean the end of the venerable Deflate algorithm.

However, before trying to understand the Deflate system, it's important to understand the other two compression strategies that make it up-- Huffman coding and LZ77 compression.

Huffman coding is a form of prefix coding. You've almost certainly used a prefix code when using the phone, for example.

Starting from the dial tone, you press a sequence of what may be five, seven, eight, eleven, twelve, or some other number of keys, and each sequence of keys reaches another specific phone line.

Now suppose you're in an office setting with an internal switchboard, as many large companies do. All other phones only require five numbers to dial, instead of seven -- that's because it's expected that you'll be calling those numbers more often. You may still need to call other numbers, though so all of those numbers have a `9' added to the front.

For its part, LZ77 compression works by finding sequences of data that are repeated. The term "sliding window" is used. All it really means is that at any given point in the data, there is a record of what characters went before.

A 32K sliding window means that the compressor (and decompressor) have a record of what the last 32768 (32 * 1024) characters were. When the next sequence of characters to be compressed is identical to one that can be found within the sliding window, the sequence of characters is replaced by two numbers-- a distance, representing how far back into the window the sequence starts, and a length, representing the number of characters for which the sequence is identical.

In a whitepaper, Google collaborators Jyrki Alakuijala, Evgenii Kliuchnikov, and Lode Vandevenne also explain that the Brotli algorithm adds a static dictionary to the compression algorithm.

“It contains 13,504 words or syllables of English, Spanish, Chinese, Hindi, Russian, and Arabic, as well as common phrases used in machine readable languages, particularly HTML and JavaScript. The total size of the static dictionary is 122,784 bytes.

The static dictionary is extended by a mechanism of transforms that slightly change the words in the dictionary. A total of 1,633,984 sequences, although not all of them unique, can be constructed by using the 121 transforms.”

In tests under Linux 3.13.0 on an Intel Xeon Eg-1650 v2 server running at 3.5 GHz, the researchers claimed Brotli running at a 3.381:1 compression ratio could compress at 98.3 MB/s and decompress at 334 MB/s.

The fastest Deflate could run at a low-quality setting of 2.913:1 compression was 93.5 MB/s compression and 323 MB/s decompression.

Google notes that better compression would particularly benefit mobile users, who in addition to faster page loads would get savings on their data charges and hopefully lower battery consumption.

We need to note that Deflate's endurance demonstrates just how effective the original work was in 1996 and still is today. It's described in RFC 1951.

Source: Google.

Get the most dependable SMTP server for your company.

Share on Twitter.

IT News Archives | Site Search | Advertise on IT Direction | Contact | Home

All logos, trade marks or service marks on this site are the property of their respective owners.

Sponsored by Sure Mail™, Avantex and
by Montreal Server Colocation.

       © IT Direction. All rights reserved.