
BWT: Characteristics of Transformed Data
Data converted by BWT consist of identical symbols, but their order is determined by the characteristics of the original data. The following examples shall demonstrate this.
The transformed symbols represent the predecessors of the data in the first column because of the rotation of the original data.
a b r a c a d a b r a
a a b r a c a d a b r ra
a b r a a b r a c a d da
a b r a c a d a b r a aa
a d a b r a a b r a c ca
a c a d a b r a a b r ra
b r a a b r a c a d a ab
b r a c a d a b r a a ab
d a b r a a b r a c a ad
c a d a b r a a b r a ac
r a a b r a c a d a b br
r a c a d a b r a a b br
Thus the context is represented, in which certain symbols appear. The exemplary data set starts with a collection of all symbols {'r', 'd', 'a', 'k', 'r'} preceding the symbol 'a'. This characteristic is emphasized particularly in the next example.
On the left side the original data consist of three identical, consecutive sequences ("abcd"). Therefore always a 'd' precedes an 'a', an 'a' always a 'b', etc. Unlike this the original data for the right example does not provide redundant structures.
a b c d a b c d a b c d a b c d a a c c b d d b
a b c d a b c d a b c d a a c c b d d b a b c d
a b c d a b c d a b c d a b c d a a c c b d d b
a b c d a b c d a b c d a c c b d d b a b c d a
b c d a b c d a b c d a b a b c d a a c c b d d
b c d a b c d a b c d a b c d a a c c b d d b a
b c d a b c d a b c d a b d d b a b c d a a c c
c d a b c d a b c d a b c b d d b a b c d a a c
c d a b c d a b c d a b c c b d d b a b c d a a
c d a b c d a b c d a b c d a a c c b d d b a b
d a b c d a b c d a b c d a a c c b d d b a b c
d a b c d a b c d a b c d b a b c d a a c c b d
d a b c d a b c d a b c d d b a b c d a a c c b
After the Burrows-Wheeler-Transformation uniform symbol combinations lead to repetitions within transformed data. The arising redundancy may be exploited by simple statistic codes, which react in high measures to such changes in symbol distribution, e.g. the adaptive Huffman coding.
|
< ^ >
|
Adaptive Huffman Code [ ]
|
|