
Rounding Errors at the Interval Dividing
In mathematics numbers have an unlimited range of values, so that the lower and upper endpoint of any interval will never become identical. In practice the accuracy of real computers is naturally limited, so that this case may happen.
Symbols that are represented by such a zero sized interval cannot be encoded, because a unique identification is not possible. The lower endpoint would also belong to the next sub-interval. To avoid this type of malfunction special mechanisms has to be implemented guaranteeing that the size of any interval is greater than 0 in any case.
The determination becomes critically, if symbols rarely appear in large amounts of data. One single symbol in a file of 1 GByte for example results in a relative probability of
1/(1024*1024*1024) ~ 9.3132 E-10.
This value multiplied with the size of the sub-interval requires considerably more than 10 significant decimal numbers which exceeds the accuracy of common floating-point units.
On the other hand any provision to adjust the size of too small intervals affects the efficiency of the compression method, because it deviates from the real probability distribution.
< ^ >
|