The algorithm processes discretized and amplitude limited versions of $X$ and $Y$. We assume that $X$ and $Y$ are discretized in $M$ and $N$ elements $x_\mu$ and $y_\nu$, respectively. For each element $x_\mu$, the conditional channel transition probability $P\left[\left.y_\nu \right| x_\mu\right]$ is computed which gives the probability that $y_\nu$ is received given that $x_\mu$ was sent. In particular, $P\left[\left.y_\nu \right| x_\mu\right]$ is given by $$P\left[\left.y_\nu \right| x_\mu \right] = Q\left(\frac{\frac{y_\nu + y_{\nu+1}}{2}-x_\mu}{\sigma_n}\right) - Q\left(\frac{\frac{y_{\nu-1} + y_\nu}{2}-x_\mu}{\sigma_n}\right)$$ on an AWGN channel with zero mean and noise variance $\sigma_n^2$. Thereby, $Q(\cdot)$ denotes the Gaussian Q-function defined as $$Q(z)=\frac{1}{\sqrt{2\pi}}\int_{z}^{\infty}\exp\left(-\frac{1}{2}u^2\right)\text{d}u \text{ .}$$
The following derivations are derived in [1,2,3]:
The channel transition matrix $\mathbf{P}$ of size NxM holds the elements $P\left[\left.y_\nu \right| x_\mu\right]$ in the $\mu$-th column and $\nu$-th row. Thus, the mutual information $I(X;Y)$ is given by $$I(X;Y) = \mathbf{1}_{1\times N}\cdot\left(\tilde{\mathbf{P}}\mathbf{p}_x\right) - \left(\mathbf{P}\mathbf{p}_x\right)^T\log_2\left(\mathbf{Pp}_x\right)$$ with vector $\mathbf{p}_x$ holding the a priori probability of the elements $x_\mu$, i.e. $P\left[x_\mu\right]$. Moreover, short-hand notation $\tilde{\mathbf{P}}$ is defined as $\tilde{\mathbf{P}} = \mathbf{P} \odot \log_2\mathbf{P}$, where $\odot$ performs a elementwise multiplication.
As $I(X;Y)$ is a concave function in $\mathbf{p}_x$, a unique maximum exists which can be retrieved by convex programming with constraints on $x$ and $\mathbf{p}_x$. For example \begin{align*} \textbf{ maximize} \quad &\mathbf{1}_{1\times N}\cdot\left(\tilde{\mathbf{P}}\mathbf{p}_x\right) - \left(\mathbf{P}\mathbf{p}_x\right)^T\log_2\left(\mathbf{Pp}_x\right)\\ \textbf{subjected to} \quad &P\left[x_\mu\right]\geq 0 \ \textbf{ and }\sum_{\mu=1}^M P\left[x_\mu\right] = 1 \ \textbf{ and }\sum_{\mu=1}^M P\left[x_\mu\right]x_\mu^2 = 1 \end{align*} yields the capacity as well as the capacity achieving input distribution $\mathbf{p}_x$ for a power-limited channel.
The algorithm implementation is straight forward with MATLAB enhanced by the CVX modeling system. The files can be downloaded here.
For referencing this webdemo, use:
T. Handte, "AWGN Channel Capacity with various constraints," webdemo, Institute of
Telecommunications, University of Stuttgart, Germany, Jan. 2015. [Online] Available: http://webdemo.inue.uni-stuttgart.de