flowchart LR
Source -- "$$W_n$$" --> Encoder
Encoder -- "$$X^n$$" --> Channel
Channel -- "$$Y^n$$" --> Decoder
Decoder -- "$$\hat{W}_n$$" --> Sink
Channel Capacity
Information theory [1] [2] provides fundamental understanding of the limits of communication. Consider the following block diagram.
Here we have a message \(W_n\) consisting of \(nR\) bits encoded into a codeword \(X^n\) of length \(n\) channel uses. The rate \(R\) is measured in units of bits per channel use. The channel is modeled by a probability law \(P_{Y^n|X^n}(y^n|x^n)\), so that the output \(Y^n\) is a potentially noisy version of the input \(X^n\). The decoder extracts the message estimate \(\hat{W}_n\) from the channel output \(Y^n\).
One of the fundamental results in information theory is that it is possible to transmit information reliably over a channel in the sense that average probability of error becomes arbitrarily small the longer the encoding and decoding we allow. Mathematically, we say that a communication rate \(R\) is achievable if there exists a sequence of encoders and decoders with \(\mathbb{P}[W_n \neq \hat{W}_n] \rightarrow 0\) as \(n \rightarrow \infty\). The channel capacity \(C\) is defined as the least upper bound, or supremum, of the achievable rates.
In addition, we can in principle determine the value of the channel capacity for a given communication channel. For a fairly wide class of channels, we have
\[C = \lim_{n\rightarrow\infty} \sup_{P_X^n(x^n)} \frac{1}{n}\mathbb{I}(X^n;Y^n),\]
where the the mutual information is defined as
\[\mathbb{I}(X^n;Y^n) := \sum_{x^n,y^n} P_{X^n,Y^n}(x^n,y^n) \log \frac{P_{X^n,Y^n}(x^n,y^n)}{P_{X^n}(x^n)P_{Y^n}(y^n)}.\]
Two special cases:
The memoryless case in which the probably law factors as \[P_{Y^n|X^n}(y^n|x^n) = \prod_{k=1}^{n} P_{Y|X}(y_k|x_k), \quad k=1,2,\ldots,n,\] and the channel capacity simplifies to \[C = \sup_{P_X(x)} \mathbb{I}(X;Y).\]
The Gaussian memoryless case in which \(P_{Y|X}(y|x)\) is zero-mean Gaussian with variance \(\sigma^2\), the input has an average power constraint \(\sum_{k=1}^{n}X_k \le nP\), and the channel capacity becomes \[C = \frac{1}{2} \log_2\left(1 + \frac{P}{\sigma^2} \right).\]