Introduction into variable length codes

To communicate a message through a channel or to store it on a digital medium, the symbols of the discrete alphabet (source) used to compose the message have to be encoded to binary words first. This is subject to the source coding field of the information theory.

One bit being a binary digit represented by the value "0" or "1" in a codeword can carry a limited amount of information. This amount is referred to as bit being the unit of information content.

The information content of a symbol taken from a source on the other hand depends on its probability of ocurrence in the discrete source: The less the probability, the higher the information content.

If from a discrete source $X$ a symbol $x_k$, being one of $N$ Symbols, can be randomly taken with the probability $P[x_k]=P_k$, its information content is given as
$$
I(x_k) = \text{log}_2\left(\frac{1}{P_k}\right).
$$
The average information content of all symbols from a given source $X$ is called the entropy
$H(X)= \sum_{k=1}^N{P_k \cdot \text{log}_2\left( \frac{1}{P_k}\right)}
$.

It has the property to be less than or equal to the entropy of a same-size-source with equal distribution of symbols:
$$
P_k=\frac{1}{N}\quad \forall\quad k=1 \ldots N \\
\Rightarrow H(X) = H_0(X) = \text{log}_2\left(N\right).
$$
The webdemo "Entropy of a discrete source" gives an insight into the relation between probablility distribution, information content and the entropy.

When assigning codewords with equal lengths to the symbols in the source, each codeword has at least the length $ m_k = \left\lceil \text{log}_2\left(N\right) \right\rceil \frac{\text{bit}}{\text{Symbol}} $. Since this codeword length is always an integer and given that the entropy is mostly less than $\text{log}_2\left(N\right)\frac{\text{bit}}{\text{Symbol}}$, it is clear that on average, a codeword uses more binary digits than it carries information content. To overcome this and minimize the average codeword length towards the entropy of the source, varable-length-codes can be used, which lead to non-integer average codeword lengths. This is the main task of the source coding theory.