Definition of Mutual information
Mutual information $I(x_i;y_i)$ (in German referred to as "Transinformation", or "wechselseitige Information") of a transmitted symbol $x_i$ and a received symbol $y_j$ is defined as follows:
$$
I(x_i;y_i) = \log_{2}\left(\frac{1}{P[x_i]}\right) - \log_{2}\left(\frac{1}{P[x_i|y_j]}\right) = \log_{2}\left(\frac{P[y_j|x_i]}{P[y_j]}\right)
$$
Average mutual information $I(X;Y)$ of a transmitter $X$ and a receiver $Y$ over a channel with an transfer matrix $T$ is defined as follows:
$$
I(X;Y)=\sum\limits _{i=1}^{M}\sum\limits _{j=1}^{N}P[x_{i},y_{j}]\cdot I(x_{i};y_{i})=\sum\limits _{i=1}^{M}\sum\limits _{j=1}^{N}P[y_{j}|x_{i}]\cdot P[x_{i}]\cdot I(x_{i};y_{i})
$$
$P[x_i]$ is the probability of the symbol $x_i$
$P[y_j|x_i]$ is the conditional or transition probability of an event $y_j$ given that another event $x_i$ has occurred, this is said to be "the probability of $y_j$ given $x_i$".
The capacity of a channel ist defined as follows:
$$
C = \max_{P[x_i]}(I(X;Y))
$$
The transfer matrix $T$ of a channel is defined as follows:
$$T=\begin{pmatrix}P[y_{1}|x_{1}] & P[y_{2}|x_{1}] & ... & P[y_{N}|x_{1}]\\
P[y_{1}|x_{2}] & P[y_{2}|x_{2}] & ... & P[y_{N}|x_{2}]\\
\vdots & \vdots & ... & \vdots\\
P[y_{1}|x_{M}] & P[y_{2}|x_{M}] & ... & P[y_{N}|x_{M}]
\end{pmatrix}$$
Note: for a row-sum, it holds $\sum\limits _{\begin{array}{cccc}
j=1\\
i=const\\
\end{array}}^{N}P[y_j|x_i]=1$