Data Compression


Criteria

Survey Formats

Basics

Compression Methods

Shannon-Fano

Huffman

Lempel-Ziv (LZ)

LZ77

LZSS

LZ78

LZW

Arithmetic Coding

Run Length Encoding

Burrows-Wheeler (BWT)

Implementations

Data Formats


Glossary

Index


Download


www.BinaryEssence.com

Lempel-Ziv-77 (LZ77)


Developement:


Jacob Ziv and Abraham Lempel had introduced a simple and efficient compression method published in their article "A Universal Algorithm for Sequential Data Compression". This algorithm is referred to as LZ77 in honour to the authors and the publishing date 1977.


Fundamentals:


LZ77 is a dictionary based algorithm that addresses byte sequences from former contents instead of the original data. In general only one coding scheme exists, all data will be coded in the same form:


  • Address to already coded contents
  • Sequence length
  • First deviating symbol

If no identical byte sequence is available from former contents, the address 0, the sequence length 0 and the new symbol will be coded.


Example "abracadabra":


              Addr.  Length  deviating Symbol
 abracadabra    0      0      'a'
a bracadabra    0      0      'b'
ab racadabra    0      0      'r'
abr acadabra    3      1      'c'
abrac adabra    2      1      'd'	
abracad abra    7      4      ''

Because each byte sequence is extended by the first symbol deviating from the former contents, the set of already used symbols will continuously grow. No additional coding scheme is necessary. This allows an easy implementation with minimum requirements to the encoder and decoder.


Restrictions:


To keep runtime and buffering capacity in an acceptable range, the addressing must be limited to a certain maximum. Contents exceeding this range will not be regarded for coding and will not be covered by the size of the addressing pointer.


Compression Efficiency:


The achievable compression rate is only depending on repeating sequences. Other types of redundancy like an unequal probability distribution of the set of symbols cannot be reduced. For that reason the compression of a pure LZ77 implementation is relatively low.


A significant better compression rate can be obtained by combining LZ77 with an additional entropy coding algorithm. An example would be Huffman or Shannon-Fano coding. The wide-spread Deflate compression method (e.g. for GZIP or ZIP) uses Huffman codes for instance.


 <   ^   > 

Deflate [Deflate]

Deflate64™ [Deflate64™]

Huffman Coding [Huffman Coding]

GZIP [GZIP]

ZIP [ZIP]

Lempel-Ziv Coding (LZ) Lempel-Ziv Coding (LZ) Lempel-Ziv-Storer-Szymanski (LZSS)