1. Preliminaries
The geometric distributed-lag specification for revenue is
R_t = \alpha + \gamma \sum_{k=0}^{\infty} \lambda^k X_{t-k} + u_twith Rt revenue, Xt advertising expense, α the intercept (base revenue), γ the impact multiplier, λ the carryover rate, and ut a stochastic disturbance.
Two assumptions deserve naming before any algebra. First, |λ| < 1, without which the infinite sum diverges and the long-run multiplier is undefined. Second, uₜ is white noise — mean zero, constant variance, serially uncorrelated. The second assumption is consequential: the Koyck transformation manufactures serial correlation in the error of the transformed equation, and the contrast between the original and transformed disturbances is the source of the estimation pathology discussed in §5.
2. The Koyck Derivation
Lag the revenue equation once and multiply through by λ:
\lambda R_{t-1}
=
\alpha \lambda
+
\gamma \lambda \sum_{k=0}^{\infty} \lambda^k X_{t-1-k}
+
\lambda u_{t-1}The index shift j = k + 1 converts the sum to
\gamma \sum_{j=1}^{\infty} \lambda^j X_{t-j}which is the original sum with its k = 0 term removed. Subtracting λRt−1 from Rt:
R_t - \lambda R_{t-1}
=
\alpha(1-\lambda)
+
\gamma X_t
+
u_t
-
\lambda u_{t-1}Rearranging:
\begin{aligned}
R_t
&=
\alpha(1-\lambda)
+
\lambda R_{t-1}
+
\gamma X_t
+
w_t
\\
\\
w_t
&=
u_t - \lambda u_{t-1}
\end{aligned}The disturbance wt is MA(1) with autocovariances
\mathrm{Var}(w_t)
=
(1+\lambda^2)\sigma_u^2,
\qquad
\mathrm{Cov}(w_t,w_{t-1})
=
-\lambda \sigma_u^23. Three Multipliers
A single permanent shock ∆X at time t produces revenue responses ∆Rt = γ∆X, ∆Rt+1 = γλ∆X, ∆Rt+2 = γλ²∆X, and so on. Summing the geometric series:
\sum_{k=0}^{\infty} \Delta R_{t+k}
=
\gamma \Delta X \sum_{k=0}^{\infty} \lambda^k
=
\frac{\gamma}{1-\lambda}\Delta XThree parameters fall out, and each deserves a name when teaching the model:
- Impact multiplier: γ. The contemporaneous response of revenue to a unit shock in advertising.
- Long-run multiplier: γ / (1 − λ). The cumulative response over the infinite horizon.
- Carryover amplification: 1 / (1 − λ). The ratio of long-run to impact response. At λ = 0.7, advertising is amplified by a factor of roughly 3.33; at λ = 0.9, by a factor of 10.
Anchoring the algebra to these three labels lets the reader move between the parameter and its economic interpretation without re-deriving anything.
4. The Median Lag
Define S = T + 1 as the smallest number of periods such that the cumulative response covers half of the long-run effect. Then
\sum_{k=0}^{T} \Delta R_{t+k}
=
\frac{1}{2}
\cdot
\frac{\gamma}{1-\lambda}
\Delta XThe left side is γ(1 + λ + ⋯ + λT)∆X. The finite geometric sum is
1 + \lambda + \cdots + \lambda^T
=
\frac{1-\lambda^{T+1}}{1-\lambda}Substituting:
\gamma
\cdot
\frac{1-\lambda^{T+1}}{1-\lambda}
\cdot
\Delta X
=
\frac{1}{2}
\cdot
\frac{\gamma}{1-\lambda}
\cdot
\Delta Xwhich collapses to
1-\lambda^{T+1}
=
\frac{1}{2},
\qquad
\text{equivalently }
\lambda^S
=
\frac{1}{2}
\quad
(S=T+1)Taking logarithms and solving for S:
S
=
\frac{\ln(1/2)}{\ln(\lambda)}
=
\frac{-\ln(2)}{\ln(\lambda)}
=
\frac{\ln(2)}{-\ln(\lambda)}The last form is preferable for two reasons. Since 0 < λ < 1 implies ln λ < 0, writing −ln λ in the denominator makes the positivity of S manifest. And ln 2 ≈ 0.693 is a number readers recognize, so S ≈ 0.693 / (−ln λ) becomes a usable rule of thumb.
The function rises sharply as λ approaches 1:
| λ (carryover) | Median lag S = ln 2 / (−ln λ) | Mean lag λ / (1 − λ) |
|---|---|---|
| 0.30 | 0.58 | 0.43 |
| 0.50 | 1.00 | 1.00 |
| 0.70 | 1.94 | 2.33 |
| 0.80 | 3.11 | 4.00 |
| 0.90 | 6.58 | 9.00 |
| 0.95 | 13.51 | 19.00 |
At λ = 0.5 the median lag is exactly one period; at λ = 0.9 it is roughly 6.58 periods. For annual Compustat data, λ above 0.8 implies the half-life of an advertising shock exceeds three years — an empirical magnitude that is itself a useful pedagogical anchor.
5. The Mean Lag
The median lag answers when half of the response has arrived. The mean lag answers what the typical timing of the response is. It is the standard companion statistic in distributed-lag work, defined as the expected value of k under the geometric weights:
\text{Mean lag}
=
\frac{
\sum_{k=0}^{\infty} k\lambda^k
}{
\sum_{k=0}^{\infty} \lambda^k
}
=
\frac{
\lambda/(1-\lambda)^2
}{
1/(1-\lambda)
}
=
\frac{\lambda}{1-\lambda}Reporting both statistics is conventional. The mean lag exceeds the median lag whenever λ > 0, reflecting the right-skew of geometric weights — the long thin tail pulls the mean rightward while leaving the median anchored near the bulk of the mass. At λ = 0.7 the median lag is roughly 1.94 and the mean lag is roughly 2.33; at λ = 0.9 they are 6.58 and 9.00. The gap widens with λ and is itself a measure of the asymmetry of the lag structure.
6. Note on Estimation
The Koyck transformation is presented in textbooks as if it were an estimation strategy. It is not, at least not under OLS. The transformed equation
R_t
=
\alpha(1-\lambda)
+
\lambda R_{t-1}
+
\gamma X_t
+
w_thas two diseases simultaneously. The lagged dependent variable Rt−1 is correlated with the error wt = ut − λut−1, because Rt−1 inherits ut−1 from the previous period. OLS on this equation is inconsistent. And even if the correlation were absent, the MA(1) structure of wt induces serial correlation in the residuals, so naive standard errors are wrong and HAC corrections or full ARMA error modeling become mandatory for inference.
Two remedies are standard. The first is instrumental variables, using Xt−1 (and possibly further lags) as instruments for Rt−1 — the lagged X is correlated with the lagged R but, under the original white-noise assumption on ut, uncorrelated with the contemporaneous shock. The second is nonlinear least squares directly on the original geometric-lag specification, treating γ and λ as a joint parameter pair. Both deliver consistent estimates; the choice between them is a matter of sample size, instrument strength, and the analyst’s preference for transparency over efficiency.
References