Prewhitening in HAC estimation

May 19, 2026 by Ednaldo Silva

The term prewhitening sounds arcane and, to some ears, possibly worse. The word belongs to a long-standing optical metaphor in spectral analysis that predates its econometric usage. This note sets out the etymology, the formal statistical content, and the specific procedure introduced by Andrews and Monahan (1992) for improving heteroskedasticity- and autocorrelation-consistent (HAC) covariance estimation in small samples.

1. The optical metaphor

In physics and electrical engineering, white denotes a signal whose power spectral density is flat across all frequencies, in direct analogy with white light, which contains all visible wavelengths in equal intensity. A white noise process is the stochastic counterpart: its variance is distributed evenly across the frequency band, so no oscillation is preferentially represented. The convention extends to pink noise (low-frequency emphasis) and blue noise (high-frequency emphasis); both names ride on the same prism. The terminology entered statistics from harmonic analysis and signal processing in the early twentieth century and is universal across physics, engineering, statistics, and econometrics. It carries no extra-technical content.

2. White noise, formally

A discrete-time stochastic process {ε(t)} is white noise if

(1)\quad 
\mathbb{E}[\varepsilon(t)] = 0,
\qquad
\mathrm{Var}[\varepsilon(t)] = \sigma^2 < \infty,
\qquad
\mathrm{Cov}[\varepsilon(t),\varepsilon(s)] = 0
\quad \text{for } t \neq s

Equivalently, in the frequency domain, the spectral density of ε(t) is constant:

(2)\quad 
f(\omega)
=
\frac{\sigma^2}{2\pi},
\qquad
\omega \in [-\pi,\pi]

The two definitions are equivalent in the covariance-stationary, second-order sense and correspond to two complementary descriptions of the same object: zero autocovariances at all nonzero lags correspond to a flat spectral density. This equivalence does not require the stronger assumption that the observations are statistically independent. The optical metaphor is the bridge.

3. The HAC problem

In time-series regression, OLS residuals are typically not white. They carry autocorrelation and frequently heteroskedasticity, and the asymptotic variance of the OLS coefficient vector involves the long-run variance of the score:

(3)\quad 
S
=
\sum_{j=-\infty}^{\infty}
\Gamma(j)

where Γ(j) = E[ε(t)X(t) ε(t−j)X(t−j)′] is the autocovariance of the score at lag j. Newey and West (1987) proposed a positive semi-definite estimator of S by weighted truncation:

(4)\quad 
\hat{S}
=
\hat{\Gamma}(0)
+
\sum_{j=1}^{L}
w(j,L)
\left[
\hat{\Gamma}(j)
+
\hat{\Gamma}(j)^{\prime}
\right]

with Bartlett weights w(j, L) = 1 − j / (L + 1). The bandwidth L governs a bias–variance trade-off. Too small a bandwidth omits persistent autocorrelation and biases Ŝ downward; too large a bandwidth introduces noise from estimating distant autocovariances with few effective observations. Andrews (1991) derived data-dependent bandwidth selection under a quadratic spectral kernel; Newey and West (1994) extended automatic lag selection to the Bartlett case.

The remaining problem is finite-sample bias. When the score process is strongly persistent, the kernel must reach far into the lag structure to recover the long-run variance, and the variance penalty for doing so is steep at modest T. Den Haan and Levin (1997) document this bias systematically across kernel families and persistence levels.

4. The Andrews–Monahan procedure

Andrews and Monahan (1992) proposed a two-step refinement that exploits the optical metaphor directly: filter the score through a parametric model that absorbs most of the autocorrelation, apply the kernel to the resulting approximately white residuals, and then transform back. The steps are as follows.

1. Whiten. Fit a low-order vector autoregression to the score u(t) = ε(t) X(t), typically a VAR(1):

(5)\quad 
u(t)
=
A \cdot u(t-1)
+
v(t)

2. Kernel-estimate. Apply the Newey–West kernel to the VAR residuals v̂(t), which carry attenuated autocorrelation. Denote the resulting estimate Ŝv.

3. Recolor. Transform back to the scale of u(t) using the estimated VAR coefficients:

(6)\quad 
\hat{S}
=
(I-\hat{A})^{-1}
\hat{S}_v
(I-\hat{A}^{\prime})^{-1}

The intuition is that the parametric VAR captures the bulk of the persistence with a small number of estimated coefficients, leaving the kernel responsible only for whatever short-range dependence the VAR could not reach. The Monte Carlo evidence in Andrews and Monahan (1992) shows substantial finite-sample bias reduction when the score is genuinely persistent, at modest variance cost.

5. When prewhitening helps, and when it does not

Prewhitening is most useful when the score exhibits strong AR(1)-like persistence and the sample is large enough to identify the prewhitening filter reliably. It is least useful, and can be actively harmful, in three circumstances:

• When the regression specification already absorbs the persistence. If a lagged dependent variable enters the model, much of the autocorrelation in the score is captured by the lag coefficient rather than left in the residuals. Prewhitening then attacks a problem that has largely been solved upstream.

• When the prewhitening filter is poorly identified. At small T, the estimated VAR coefficients carry their own sampling variation, and the recoloring step can amplify rather than dampen finite-sample distortion.

• When comparing implementations across software. R’s sandwich::NeweyWest defaults to prewhite = TRUE; statsmodels does not prewhiten by default. Numerical parity across implementations is usually easier to establish with prewhitening disabled.

The decision is not aesthetic. Prewhitening trades a clean asymptotic statement for a finite-sample correction that depends on the unknown persistence structure being well approximated by a low-order VAR. Where that approximation is good, the gains are real; where it is not, the procedure can introduce new distortion in the name of removing old.

References

Andrews, D. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica, 59(3), 817–858.

Andrews, D. & Monahan, J. (1992). An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator. Econometrica, 60(4), 953–966.

Den Haan, W. & Levin, A. (1997). A practical guide to robust covariance matrix estimation. In G. S. Maddala & C. R. Rao (Eds.), Handbook of Statistics, Vol. 15: Robust Inference (pp. 299–342). Amsterdam: Elsevier.

Newey, W. & West, K. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3), 703–708.

Newey, W. & West, K. (1994). Automatic lag selection in covariance matrix estimation. Review of Economic Studies, 61(4), 631–653.