As you may have observed, channel codes usually become better the longer they are. In turn, ML decoding becomes also more elaborate, and therefore it is computationally expensive to compute the actual ML performance of a long code. If one however knows the distance spectrum of a code (which is equivalent to the weight spectrum for linear codes), one can compute a so-called
Union Bound (UB) as an upper bound for the frame-error rate (FER, or block error-rate BLER) of an ML decoder.
The main idea is to look at each possible error event independently, as expressed by the
pairwise error probability (PEP). This is the probability that the ML decoder would favour a (specific) different mapped codeword $\mathbf{x}'$ over the actually transitted sequence $\mathbf{x}$. In other words, the received vector $\mathbf{y}$ is closer to the codeword $\mathbf{x}'$ than to $\mathbf{x}$ in terms of the Euclidian distance, i.e.
$$
PEP(\mathbf{x},\mathbf{x}') = P[\hat{\mathbf{x}}= \mathbf{x}' | \mathbf{x} \text{ transmitted}] = P[d'_E < d_E]
$$
where $d'_E = \Vert \mathbf{y}-\mathbf{x}'\Vert$ and $ d_E = \Vert \mathbf{y}-\mathbf{x}\Vert$ denote Euclidian distances, respectively. When expressing $\mathbf{n}=\mathbf{y}-\mathbf{x}$ as the Gaussian noise vector with variance $\sigma^2$ (for each component), we can further find that this is equivalent to looking at the projection of $\mathbf{n}$ onto the connecting line between $\mathbf{x}$ and $\mathbf{x}'$. As a projection of the noise vector $\mathbf{n}$ is again a (scalar) gaussian distributed random variable $n_\mathrm{proj}$ with variance $\sigma^2$, the expression for the PEP simplifies to
$$
PEP(\mathbf{x},\mathbf{x}') = P\left[n_\mathrm{proj} > \frac{\Vert \mathbf{x}-\mathbf{x}' \Vert}{2} \right] = Q\left(\frac{\Vert \mathbf{x}-\mathbf{x}' \Vert}{2\sigma} \right) = Q\left(\frac{\sqrt{ d_H(\mathbf{x},\mathbf{x}')} }{\sigma} \right).
$$
In the last step we used the relationship of the Euclidian distance and the Hamming distance $d_E(\mathbf{x},\mathbf{x}') = 2 \sqrt{ d_H(\mathbf{x},\mathbf{x}') } $. Each PEP event can be thought of as disecting the whole $\mathbb{R}^N$ into halfspaces (by a hyperplane in the middle of the connection between $\mathbf{x}$ and $\mathbf{x}'$).
When we now regard all codewords of the code besides the transmitted codeword, we find the frame-error probability and an upper bound (by adding up all PEPs)
$$
P[\hat{\mathbf{x}} \ne \mathbf{x}] = P\left[ \bigcup_{\mathbf{x}'\in C, \mathbf{x}'\ne\mathbf{x}} \hat{\mathbf{x}} = \mathbf{x'} \right] \le \sum_{\mathbf{x}'\in C, \mathbf{x}'\ne\mathbf{x}} PEP(\hat{\mathbf{x}}, \mathbf{x}') = \sum_{j=d_\mathrm{min}}^{N} A_j Q\left(\frac{\sqrt{ j } }{\sigma} \right) = UB(\sigma),
$$
where $A_j$ denotes the number of codewords with hamming weight $j$. The linearity of the code ensures that the neighborhood of every codeword looks exactly the same. Thus, we could only look at the all-zero codeword and distances turn into codeword weights. The sum therefore contains the PEP of all codewords but the all-zero codeword. As the halfspaces are clearly not disjoint, we count overlapping regions multiple times and we have just an upper bound (this relation is also called
Boole's inequality). Nonetheless, we know that optimal decoding of the code will be at least as good as the UB.
For many algebraic codes, the full distance spectrum (i.e. all $A_j$) is known, but not always. Additionally, we see from the properties of the
Q-function that it shrinks exponentially with the weight $j$, and thus the terms corresponding to large weight $j$ have less impact. This is why one often uses the
Union Bound Estimate (UBE), which consists of only the dominating term, rather than the UB in practice. Computing the UBE only requires to know the minimum distance and the number of minimum weight codewords of the code.
$$
UBE(\sigma) = A_{d_\mathrm{min}} Q\left(\frac{\sqrt{ d_\mathrm{min} } }{\sigma} \right) \le UB(\sigma)
$$
As all other terms are neglected, the UBE is smaller than the UB and thus no longer an upper bound on the FER, weakening its interpretability.
For nonlinear codes, the UB cannot be computed, as it relies on the fact that the neighborhood of all codewords looks identical, which is only the case for linear codes.