Data Compression


Criteria

Survey Formats

Basics

Compression Methods

Data Formats


Arithmetic Coding (AC)

Principle of the AC

General Algorithm

Encoding

Decoding

Calculation of Intervals

Division

Rounding Errors

Shifting of Intervals

AC versus Huffman

Data with high Redundancy

Adaptive AC

Implementations


Glossary

Index


Download


www.BinaryEssence.com

Rounding Errors at the Interval Dividing


In mathematics numbers have an unlimited range of values, so that the lower and upper endpoint of any interval will never become identical. In practice the accuracy of real computers is naturally limited, so that this case may happen.


Symbols that are represented by such a zero sized interval cannot be encoded, because a unique identification is not possible. The lower endpoint would also belong to the next sub-interval. To avoid this type of malfunction special mechanisms has to be implemented guaranteeing that the size of any interval is greater than 0 in any case.


The determination becomes critically, if symbols rarely appear in large amounts of data. One single symbol in a file of 1 GByte for example results in a relative probability of
1/(1024*1024*1024) ~ 9.3132 E-10.
This value multiplied with the size of the sub-interval requires considerably more than 10 significant decimal numbers which exceeds the accuracy of common floating-point units.


On the other hand any provision to adjust the size of too small intervals affects the efficiency of the compression method, because it deviates from the real probability distribution.


 <   ^   > 

Division of the Intervals Division of the Intervals Shifting of the Intervals