Logo der Uni Stuttgart
LZ78 description: "The Algorithm"

The encoding algorithm is to be illustrated using an example:

• Suppose the string $\boldsymbol{ABBCBCABABCAABCAAB}$ is to be compressed using the LZ78.

Steps:

• Initialy an emtpy dictionary is to be constructed with two columns, one for the dictionary entry and a the other of the its index.

• The first input in our example is "A" which is to be encoded. The algorithm checks if "A" already exist in the dictionary, if not, as in our case, "A" is inserted into the dictionary and an encoded codeword (output) of the format $\left(i,S\right)$ is to represent "A" in which $S$ is the string to be inserted into the dictionary, and $i$ is the dictionary index of the prefix of the input to be encoded. In our case with input "A" there is not prefix to it, therefore $i$ would be "0" and the output codeword would be $(0,A)$.

• For the second input "B", the same would happen as in the first input "A" and an output codeword of $(0,B)$ would be added to the output stream.

• For the third input "B", the algorithm will check its existence in the dictionary to find it at index 2.

• The algorithm would keep looking at the next input until a string is found which is NOT in the dictionary.

• In our case here, the string "BC" is not in the dictionary and thus would be added to it, and the output would be $(2,C)$, where 2 reffers to the location of the string $B$ in the dictionary

• The algorithm continues the same as before until all the inputs are encoded and a stream is displayed representing the compression/encoding of the input data as shown below:

Italian Trulli

The encoded/compressed data:

$$\boldsymbol{(0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B)}$$