
Formats for New Symbols
Immediately after the control character (NYA) follows the new symbol which is not encoded by the Huffman code. In principle a large variety of formats is available to encode the new symbol. The simplest example would be linear encoding of the entire range of values.
With regard to the compression, it is advisable to select a more efficient format adapting the code length to the number of remaining symbols. Through this the code length decreases continuously until the last two symbols can be encoded with a single bit.
For realization an ordered list must be defined by both the encoder and decoder. The size of this list will be reduced step-by-step during the coding process.
Subsequently a method which adjusts the required code length to the number of remained symbols is described. The range of values is exploited better in contrast to linear coding with constant code length. Two different lengths which are assigned according to the following scheme are used:
Z: number of remaining symbols
b: required code length [in bit]
r: "remainder", that cannot be
encoded with code length b
n: code number (1 - Z)
conditions: Z = 2b + r
and
0 <= r < 2b
code length range of values coding
b+1 bit 1 to 2r n-1
b bit 2r+1 to Z n-r-1
Example with 5 symbols:
Z = 5
b = 2 bit
r = 1
conditions:
Z = 2b + r 5 = 22 + 1
0 <= r < 2b 0 <= 1 < 4
code length range of values coding
3 bit 1 - 2 n-1
2 bit 3 - 5 n-1-1
symbol code word
1 000
2 001
3 01
4 10
5 11
Example with 66 symbols:
Z = 66
b = 6 bit
r = 2
conditions:
Z = 2b + r 66 = 26 + 2
0 <= r < 2b 0 <= 2 < 64
code length range of values coding
7 bit 1 - 4 n-1
6 bit 5 - 66 n-2-1
symbol code word
1 0d 0000 000b
2 1d 0000 001b
3 2d 0000 010b
4 3d 0000 011b
5 2d 0000 10b
6 3d 0000 11b
7 4d 0001 00b
.. ... ...
64 61d 1111 01b
65 62d 1111 10b
66 63d 1111 11b
< ^ >
|