A Mathematical Proof of the α/H Bias in Ratio Estimators
Analysts often work with ratios. Whether examining price-to-earnings ratios in finance or profit indicators in transfer pricing, the ratio m(i) = Y(i) / X(i) masks a systematic bias that affects quartile estimates and other location parameters. The index i = 1 to N observations. N < 30 is considered a small sample unlikely to produce reliable means and deviations from the mean. Significant deviations are considered outliers.
This post provides a complete algebraic proof demonstrating that when Y(i) = α + β X(i) is the true underlying relationship, the quartiles of the ratio m(i) are biased estimators of the slope parameter β, with the bias equal to α/H. As demonstrated below, H is the harmonic mean of X(i).
The Problem Statement
Given a linear relationship between two variables:
Y(i) = α + β X(i), i = 1, 2, …, N observations
where α is the intercept and β is the slope coefficient.
Under the CPM/NMM in transfer pricing, Y is operating profit, and X is the base variable (revenue, cost, or assets). Analysts frequently compute the ratios m(i) = Y(i) / X(i) and then compute their quartiles (or median). The critical question is: are these ratio-based quartiles unbiased estimators of the slope β?
A Mathematical Proof
Step 1: Connecting the Two Starting Equations
Starting with the definition of the ratios:
m(i) = Y(i) / X(i) = [α + β X (i)] / X(i)
Separating into two terms:
m(i) = α / X(i) + β
This crucial equation shows that m(i) equals the true slope β plus a term that varies inversely with X(i). This inverse relationship creates the asymmetry that leads to bias.
Step 2: Calculate the Expected Value
Taking the average (sample mean) across all N observations:
E[m] = (1/N) Σ m(i)
Since Y(i) = α + β X(i), we can replace Y(i) with α + β X(i):
E[m] = (1/N) Σ [α/X(i) + β]
Separating the sum:
E[m] = (α/N) Σ (1/X(i)) + β
Step 3: Connect to the Harmonic Mean
The harmonic mean of X(i) is defined as:
H = N / Σ (1/X(i))
Rearranging to solve for the sum:
Σ (1/X(i)) = N/H
Substituting this into our expression for E[m]:
E[m] = (α/N) × (N/H) + β = α/H + β
Step 4: Identify the Bias
The bias is the difference between the expected value of the estimator and the true parameter:
Bias = E[m] – β = (β + α/H) – β = α/H
Step 5: Extension to Quartiles
Since m(i) = β + α/X(i), the distribution of m(i) is systematically shifted from β by amounts depending on α/X(i). The quartiles, being location parameters of this distribution, inherit this bias. For the median and other quartiles, the bias converges to α/H when X(i) values are reasonably dispersed, because the harmonic mean naturally captures the weighted effect of the 1/X(i) terms.
Why This Matters
The bias a/H has several essential characteristics:
Direction: The bias is positive when α > 0 and negative when α < 0. Only when the actual relationship passes through the origin (α = 0) is the ratio estimator unbiased.
Magnitude: The bias is inversely proportional to the harmonic mean of X(i). When X(i) values are small, H is small, and the bias can be substantial.
Asymmetry: The relationship m(i) = α/X(i) + β is hyperbolic. Small X(i) values create large positive deviations in m(i), while large X(i) values create minor deviations. This asymmetry cannot be eliminated by averaging.
Practical Implications
This bias affects many standard analyses:
Financial ratios: When calculating average P/E ratios or similar metrics, such as the selected profit indicator (ROS, ROC, or ROA) in transfer pricing, the arithmetic mean (or median) distorts the portfolio’s valuation (or the comparables’ profit indicator) when there are fixed costs (a non-zero intercept).
The harmonic mean is the appropriate average for ratios; however, the standard error of the harmonic mean is not easy to calculate. In fact, the standard error for the harmonic mean does not have a native function in Python, and we cannot find it in standard statistical textbooks. Regression analysis avoids these tourbillons.
Ratio estimation in surveys: Classical ratio estimators in survey sampling assume the linear relationship passes through the origin. When this assumption fails, the bias α/H emerges. Regression analysis avoids this bias.
What Should Analysts Do?
When you need to summarize a relationship between Y and X:
1. Test if the relationship passes through the origin. Fit the linear regression Y = α + β X and test whether α = 0. If you cannot reject this hypothesis, ratio methods are appropriate.
2. Use regression instead of ratios when α ≠ 0. The regression estimator of β is unbiased regardless of whether α = 0.
3. If you must use ratios, consider the harmonic mean. For averaging ratios of the form Y(i)/X(i), the weighted harmonic mean gives less weight to extreme values and provides a more stable estimate.
4. Be aware of the bias when interpreting quartiles. If you report median or quartile values of m(i), understand they distort β by α/H when α ≠ 0.
Conclusion
The ratio m(i) = Y(i) / X(i) is a biased estimator of the slope β when the actual relationship Y = α + β X has a non-zero intercept. The bias equals α/H, where H is the harmonic mean of X(i). This bias affects quartiles and other location parameters of the ratio distribution.
Understanding this bias is crucial for proper statistical inference. When the assumption of regression through the origin is violated, regression (OLS with HAC error correction, or Robust regression) estimators should be preferred over ratio estimators. When ratios are required, analysts should use the harmonic mean and interpret results with awareness of the systematic bias.
Reference
Cochran, William (1977). Sampling Techniques, 3rd ed. John Wiley & Sons, New York. See Chapter 6 “Ratio Estimators” (pages 150-177), particularly Section 6.7 “Conditions Under Which the Ratio Estimate Is a Best Linear Unbiased Estimator” (pages 158-160) and Section 6.8 “Bias of the Ratio Estimate” (page 160).