Autoregression with Exogenous Variable (ARX) is a time-series regression model that extends OLS by including a lagged value of the dependent variable Y_{t-1} as an additional regressor. This captures persistence in the dependent variable over time – the tendency of firms’ financial outcomes to be correlated with their own past values.
Model. The ARX(1) specification estimated here takes the form:
Y_t = c + \varphi \, Y_{t-1} + \beta \, X_t + \varepsilon_twhere c is the intercept, \varphi is the autoregressive coefficient on the lagged dependent variable Y_{t-1} , \beta is the coefficient on the exogenous variable X_t , and \varepsilon_t is the error term. The coefficient \varphi measures the degree of persistence: a value close to 1 indicates strong inertia, while a value close to 0 indicates that past values have little predictive power.
Estimation. The model is estimated by OLS. Because one lag is required, the first observation for each company is dropped, reducing the effective sample size by one.
Standard errors. Two sets of standard errors are reported:
OLS standard errors are the conventional standard errors from the fitted model.
Newey-West standard errors additionally correct for both heteroskedasticity and autocorrelation in the residuals (HAC estimator) using up to 4 lags with no prewhitening.
Durbin h statistic. In time-series models such as ARX, it is important to test whether the residuals are serially correlated. The presence of autocorrelation would indicate that the model has not fully captured the dynamics of 𝑌, and that standard errors may be unreliable. The Durbin h statistic is designed for this purpose.
The standard Durbin-Watson test is biased toward 2 (no autocorrelation) when a lagged dependent variable is included as a regressor, making it unreliable in the ARX context. The Durbin h statistic corrects for this bias:
h = \hat{\rho} \sqrt{\dfrac{n}{1 - n \cdot \widehat{\mathrm{Var}}(\hat{\varphi})}}where \hat{\rho} = 1 - d/2 is the estimated first-order autocorrelation coefficient (derived from the Durbin-Watson statistic 𝑑), 𝑛 is the number of observations, and \widehat{\mathrm{Var}}(\hat{\varphi}) is the squared standard error of the coefficient on the lagged dependent variable Y_{t-1} .
Under the null hypothesis of no serial correlation, h follows a standard normal distribution. The null is rejected at the 5% significance level if |h|>1.96.
The statistic is reported as NA when {n\cdot \widehat{\mathrm{Var}}(\hat{\varphi})} \ge 1 , because the expression under the square root becomes non-positive and the statistic is mathematically undefined.
Standard errors. Two sets of standard errors are reported:
OLS standard errors are the conventional standard errors from the fitted model, assuming homoskedastic and serially uncorrelated residuals. They are included as a baseline reference.
Newey-West standard errors are heteroskedasticity and autocorrelation consistent (HAC). They correct for both non-constant variance and serial correlation in the residuals, which are common in time-series data. When serial correlation is detected by the Durbin h test, Newey-West standard errors are the recommended basis for inference.
Implementation. The model is implemented in R using the arx() function from the gets package. Newey-West standard errors are obtained via coeftest(model, vcov=NeweyWest(model, lag=4, prewhite=FALSE)) from the sandwich and lmtest packages. The Durbin-Watson statistic 𝑑 is obtained via dwtest() from the lmtest package. The Durbin h statistic is then computed from 𝑑 using the formula above.
References
Durbin, J. “Testing for Serial Correlation in Least-Squares Regression When Some of the Regressors Are Lagged Dependent Variables.” Econometrica 38, No. 3 (1970): 410–421.
R Documentation for ‘gets’ package. Accessed May 14, 2026 [link]