Definition of Mutual information
Mutual information $I(x_i;y_i)$ (in German referred to as "Transinformation", or "wechselseitige Information") of a transmitted symbol $x_i$ and a received symbol $y_j$ is defined as follows: $$I(x_i;y_i) = \log_{2}\left(\frac{1}{P[x_i]}\right) - \log_{2}\left(\frac{1}{P[x_i|y_j]}\right) = \log_{2}\left(\frac{P[y_j|x_i]}{P[y_j]}\right)$$ Average mutual information $I(X;Y)$ of a transmitter $X$ and a receiver $Y$ over a channel with an transfer matrix $T$ is defined as follows:
$$I(X;Y)=\sum\limits _{i=1}^{M}\sum\limits _{j=1}^{N}P[x_{i},y_{j}]\cdot I(x_{i};y_{i})=\sum\limits _{i=1}^{M}\sum\limits _{j=1}^{N}P[y_{j}|x_{i}]\cdot P[x_{i}]\cdot I(x_{i};y_{i})$$ $P[x_i]$ is the probability of the symbol $x_i$
$P[y_j|x_i]$ is the conditional or transition probability of an event $y_j$ given that another event $x_i$ has occurred, this is said to be "the probability of $y_j$ given $x_i$".
The capacity of a channel ist defined as follows: $$C = \max_{P[x_i]}(I(X;Y))$$ The transfer matrix $T$ of a channel is defined as follows: $$T=\begin{pmatrix}P[y_{1}|x_{1}] & P[y_{2}|x_{1}] & ... & P[y_{N}|x_{1}]\\ P[y_{1}|x_{2}] & P[y_{2}|x_{2}] & ... & P[y_{N}|x_{2}]\\ \vdots & \vdots & ... & \vdots\\ P[y_{1}|x_{M}] & P[y_{2}|x_{M}] & ... & P[y_{N}|x_{M}] \end{pmatrix}$$ Note: for a row-sum, it holds $\sum\limits _{\begin{array}{cccc} j=1\\ i=const\\ \end{array}}^{N}P[y_j|x_i]=1$