Weighted Least Squares (WLS) in EdgarStat

Ordinary Least Squares (OLS) assumes the variance of the error term is constant across all observations.

In practice, financial data rarely behaves this way. Larger firms, higher-revenue years, or certain industries tend to produce more dispersed residuals than others. When this heteroskedasticity (unequal variance) is present, OLS estimates remain unbiased but are no longer efficient, and standard errors become unreliable. Weighted Least Squares (WLS) corrects for this by reducing the weight of observations with high variance, recovering efficiency and producing valid inference.

Step 1. OLS residuals

An ordinary least squares regression is run first:

Y_i = \beta_0 + \beta_1 X_i + \epsilon_i

The squared residuals $\hat{\epsilon}_i^2$ are used as a proxy for the unknown variance $\sigma_i^2$

Step 2. Variance estimation

An auxiliary regression is run on the log of the squared residuals, using $ln(X_i)$ as the regressor. A small constant is added for numerical stability:

\ln(\hat{\epsilon}_i^2 + 10^{-8}) = \alpha_0 + \alpha_1 \ln(X_i) + u_i

The fitted values are then exponentiated to recover the estimated variance for each observation:

\hat{\sigma}_i^2 = \exp(\hat{\alpha}_0 + \hat{\alpha}_1 \ln(X_i))

This implicitly assumes a log-linear relationship between the variance and $X_i$ , which is a modeling choice rather than a general property of WLS.

Note: due to the log transformation, observations where $X_i \le 0$ cannot be assigned a valid variance estimate and are therefore excluded from the WLS sample.

Step 3. Weighted Least Squares

Let $W = \text{diag} \left( \frac{1}{\hat{\sigma}_1^2}, \dots, \frac{1}{\hat{\sigma}_n^2} \right)$ to be the diagonal weight matrix. The main regression is re-estimated using the inverse of the estimated variances as weights:

\hat{\beta}_{WLS} = (X^T W X)^{-1} X^T W Y

Observations with higher estimated variance receive lower weight, improving efficiency relative to OLS.

Standard errors

Standard errors are computed using the HC3 heteroskedasticity-robust covariance estimator. The HC3 correction relies on the hat matrix, defined in the WLS setting as:

H = X(X^T W X)^{-1} X^T W

where $X$ is the n × k matrix of regressors and W is the diagonal weight matrix from Step 3. The scalar $h_{ii}$ denotes the i-th diagonal entry of $H$ , known as the leverage value of observation i. The HC3 estimator is then defined as:

\Omega_{HC3} = \text{diag} \left( \frac{\hat{\epsilon}_i^2}{(1 - h_{ii})^2} \right)

where $\hat{\epsilon}_i^2$ and $h_{ii}$ are taken from the WLS (weighted) model. Even after WLS, residual heteroskedasticity may remain, so HC3 is applied as an additional robustness correction.

Sample exclusions

Among the observations, those with $X_i \le 0$ are excluded by WLS, as described above.

Implementation

In EdgarStat, the estimator is implemented using the Python statsmodels library. Step 1 and the auxiliary regression in Step 2 use statsmodels: OLS; Step 3 uses statsmodels: GLS with the sigma parameter set to a diagonal matrix of estimated variances $(\hat{\sigma}_1^2, \dots, \hat{\sigma}_n^2)$ . These are the symbols used:

W is a diagonal weight matrix; H is the WLS hat matrix; and Ω_HC3 is the HC3 covariance matrix.

Standard errors are obtained via get_robustcov_results(cov_type=’HC3′).

References

Kleiber, Christian, and Achim Zeileis. Applied Econometrics with R. New York: Springer, 2008. See Chapter: Diagnostics and Alternative Methods of Regression.

Weighted Least Squares (WLS) in EdgarStat

Terms of use

By registering with EdgarStat®, you agree not to: