Introduction

Practical transmission channels are subject to noise. Therefore, they cannot transmit an arbitrary high amount of information reliable. As most applications however require error-free transmissions, *channel codes* must be used to enable error-detection and -correction in such noisy scenarios. The basic idea of a channel code is to add *redundancy* to the information that is transmitted. In other words, instead of transmitting $K$ message symbols in $K$ channel symbols, we spread the information over $N$ channel symbols, with $N>K$. If the symbols are *bits*, we call such a code a *binary code*. $N$ is called the *block length* of the code, and $K$ the *message length*. The *code rate* of the code is given by
$$
R_c = \frac{K}{N}.
$$

**Example ($N=6, K=3$)**

**Minimum Distance and Notation**

**Nonlinear Codes**

A binary code is defined by a one-to-one (i.e. invertible) *encoding rule* between $K$-bit message vectors $\mathbf{u}$ to $N$-bit codewords $\mathbf{c}$. The *codebook* or *codeword table (CWT)* is given by listing all $2^K$ possible codewords. We call a code *linear*, if any linear combination of codewords (under modulo-2 addition, XOR) is again a valid codeword (i.e. found in the codebook). For linear codes, the encoding rule is given for each codeword bit $c_i$ as a XOR-combination of several message bits $u_i$.

Let the encoding rule be $\mathbf{c}=(u_1, u_2, u_3, u_1 \oplus u_2, u_1 \oplus u_3, u_2 \oplus u_3$). Thus the code rate is $R_c = \frac{3}{6} = \frac{1}{2}$.
By listing all $2^3=8$ codewords, we have the codeword table
\begin{align*}
CWT_{6,3}=
\left(\begin{array}{ccc|ccc}
0&0&0&0&0&0\\
0&0&1&0&1&1\\
0&1&0&1&0&1\\
0&1&1&1&1&0\\
1&0&0&1&1&0\\
1&0&1&1&0&1\\
1&1&0&0&1&1\\
1&1&1&0&0&0
\end{array}\right).
\end{align*}
Furthermore, linear codes can be represented by a $K \times N$ *generator matrix* $\mathbf{G}$ with the encoding rule as the vector-matrix multiplication $\mathbf{c}=\mathbf{u}\cdot\mathbf{G}$ (modulo 2). We find the rows of the generator matrix as the codewords corresponding to unit vector messages, i.e. rows 2, 3 and 5 in the example:
$$
\mathbf{G}_{6,3}=
\left[\begin{array}{cccccc}
1&0&0&1&1&0\\
0&1&0&1&0&1\\
0&0&1&0&1&1\\
\end{array}\right]
$$
You can verify that $\mathbf{G}$ indeed generates all codewords as linear combinations of its rows.

An important property of an error correcting code is its *minimum distance* $d_\mathrm{min}$. It is the minimum number of bits one must flip to get from one codeword to another. The higher the minimum distance, the more errors the code can detect and correct (under optimal decoding).
In the following, we specify codes in $(N,K)$ or $(N,K,d_\mathrm{min})$ notation.

Linearity is a very useful property to encode, decode and to analyze error-correcting codes. However, by restricting that a set of vectors must form a linear subspace, we may not be able to achieve the greatest minumum distance. A prominent example is the (extended) NordstrÃ¶m-Robinson (NR) code, that is a (16,8,6) nonlinear binary code. The best possible linear code with the same length and dimension has, however, only a minimum distance of $d_\mathrm{min}=5$. Therefore, we expect the NR code to perform slightly better under optimal decoding than the (16,8,5) code. In practice however, the cost of giving up linearity is typically much higher than using a longer, linear code and thus, nonlinear codes are currently restricted to niche applications.