Data Compression


Survey Formats


Compression Methods



Lempel-Ziv (LZ)





Arithmetic Coding

Run Length Encoding

Burrows-Wheeler (BWT)


Data Formats




Lempel-Ziv-78 (LZ78)

One year after publishing LZ77 Jacob Ziv and Abraham Lempel hat introduced another compression method ("Compression of Individual Sequences via Variable-Rate Coding"). Accordingly this procdure will be called LZ78.

Fundamental algorithm:

LZ78 is based on a dictionary that will be created dynamically at runtime. Both the encoding and the decoding process use the same rules to ensure that an identical dictionary is available. This dictionary contains any sequence already used to build the former contents. The compressed data have the general form:

  • Index addressing an entry of the dictionary
  • First deviating symbol

In contrast to LZ77 no combination of address and sequence length is used. Instead only the index to the dictionary is stored. The mechanism to add the first deviating symbol remains from LZ77.

Beispiel "abracadabra":

                    deviating     new entry
              index  symbol       dictionary
 abracadabra    0      'a'        1  "a"
a bracadabra    0      'b'        2  "b"
ab racadabra    0      'r'        3  "r"
abr acadabra    1      'c'        4  "ac"
abrac adabra    1      'd'        5  "ad"
abracad abra    1      'b'        6  "ab"
abracadab ra    3      'a'        7  "ra"

A LZ78 dictionary is slowly growing. For a relevant compression a larger amount of data must be processed. Additionally the compression is mainly depending on the size of the dictionary. But a larger dictionary requires higher efforts for addressing and administration both at runtime.

In practice the dictionary would be implemented as a tree to minimize the efforts for searching. Starting with the current symbol the algorithm evaluates for every succeeding symbol whether it is available in the tree. If a leaf node is found, the corresponding index will be written to the compressed data. The decoder could be realized with a simple table, because the decoder does not need the search function.

The size of the dictionary is growing during the coding process, so that the size for addressing the table would increase continuously. In parallel the requirements for storing and searching would be also enlarged permanently. A limitation of the dictionary and corresponding update mechanisms are required.

LZ78 is the base for other compression methods like the wide-spread LZW used e.g. for GIF graphics.

 <   ^   > 

Lempel-Ziv Coding (LZ) Lempel-Ziv-Storer-Szymanski (LZSS) Lempel-Ziv-Welch (LZW)