Logo der Uni Stuttgart
Definition of entropy
Entropy of a source is defined as follows:
$$ H(X) = \sum\limits_{i=1}^N P_{i}\cdot\log_{2}\left(\frac{1}{P_{i}}\right) $$ $P_i$ is the probability of the symbol $x_i$
Files on a PC are organized in bytes. One byte consists of 8 bit. So there are N=256 differen bytes possible. The source has an alphabet of 256 different symbols. The symbols can be described as integer numbers of the range from 0 to 255. The following slide shows different distributions of the symbols for different files.
The maximum entropy of a source is: $$ H_{max}(X)=H_0=\log_2(N) $$
$N$ is the number of different symbols of the source.
The redundancy ist defined as follows: $$ R(X) = H_{0} - H(X) = \log_2(N) - H(X) $$