Encoding the indices & performance evaluation

The ASCII-characters in the original data are usually encoded in the form of 8-bits representation. The number of bit needed for representing the source data could be calculated by

$$\#bits=8*\#of\:Characters\;in\;the\;input\;data$$

In our previous example this would be

$$18\;chars*8\;bits=144\;bits$$

In the encoded stream $(0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B)$ we need to encode the indices as well as the characters. The characters are encoded with a simple 8-bit binary representation. The indices would be encoded in binary representation also, however, the number of bit required to represent a single index would depend on the position of the codeword. For example, the codeword $(3,A)$ is the 4th codeword, thus we will represent the dictionary index with 2 bits, which mean that the dictionary index "3" would be binary represented as $10$.

The number of bits required to represent the dicitonary index of a codeword can be calculated from the following equation as shown in [2] $$\#\;of\;bits=\begin{cases} 1 & if\;n=1\\ \lceil log_{2}n\rceil & if\;n>1 \end{cases}$$ where n is the position of the codeword.

In our example the indices would be encoded as following: $$\underline{0}\boldsymbol{A}\underline{0}\boldsymbol{B}\underline{10}\boldsymbol{C}\underline{11}\boldsymbol{A}\underline{010}\boldsymbol{A}\underline{100}\boldsymbol{A}\underline{110}\boldsymbol{B}$$ and in binary would look like this: $$\underline{0}\boldsymbol{01000001}\underline{0}\boldsymbol{01000010}\underline{10}\boldsymbol{01000011}\underline{11}\boldsymbol{01000001}\underline{010}\boldsymbol{01000001}\underline{100}\boldsymbol{01000001}\underline{110}\boldsymbol{01000010}$$

Using this result, the number of bits of the compressed data can be evaluated as following: $$(1+8)+(1+8)+(2+8)+(2+8)+(3+8)+(3+8)+(3+8)=71\;bits$$

A measurement of how good the compression algorithm is can be defined as: $$CR=\frac{Total\;number\;of\;bits\;of\;\boldsymbol{uncompressed}\;data}{Total\;number\;of\;bits\;of\;\boldsymbol{compressed}\;dat}$$ This quantity is know as the Compression Ratio. In our example the compression ratio is $$144/71=2.0282$$