Transfer pricing practice usually determines the arm’s length taxable income using the interquartile range of the chosen profit indicator, defined as a ratio m(i) = Y(i)/X(i) across comparable companies. This post shows that when the data include a nonzero intercept—a common trait of corporate financial data—those quartiles are biased. The bias equals a/H, where the numerator is the intercept and H is the harmonic mean of X(i). We then describe a three-step regression process that properly accounts for the intercept and produces an unbiased (defensible) benchmark.
1. The Linear Model and the Profit Level Indicator
Suppose the relationship between a profit measure Y (e.g., operating income) and a base X (e.g., net sales, total cost, or total assets) across N comparable firms (or enterprises) is described by the linear regression model:
\quad Y_i = a + b \cdot X_i + \varepsilon_i, \quad i = 1, \dots, N
where a is the intercept (capturing fixed costs or scale effects), b is the slope (the proportional return), and ε(i) is a mean-zero disturbance. The profit level indicator (PLI) is the marginal (first derivative) return on the base:
\quad \mathrm{PLI} \equiv \frac{dY}{dX} = bThis is scale-invariant, an economically meaningful measure of arm’s length profitability. It is the same for a large firm and a small firm that share the same underlying economics. Standard OLS yields the best linear unbiased estimator b̂ of this quantity by the Gauss–Markov theorem.
2. Why the Ratio m(i) = Y(i)/X(i) Is Contaminated
Substituting the model into the ratio gives:
\quad m_i = \frac{Y_i}{X_i} = b + \frac{a}{X_i} + \frac{\varepsilon_i}{X_i}The ratio does not equal the PLI b. It is inflated (or deflated) by the term a/X(i), which varies inversely with firm size. A comparable with a small base X(i) will show a high ratio even if its true proportional profitability is identical to a large comparable. This is the contamination (unreliable measure of the PLI).
2.1 The Bias of the Sample Mean
Taking the sample mean across comparables:
\quad \bar{m} = b + \frac{a}{N} \sum_{i=1}^{N} \frac{1}{X_i}The harmonic mean of X(1), …, X(N) is defined by:
Substituting:
\quad \bar{m} = b + \frac{a}{H}The mean of the ratio exceeds (or falls below) the true PLI by a/H
2.2 The Bias of the Quartiles
The same logic applies to every quantile of the distribution of m(i). Because each observation is shifted by a/X(i), the p-th quantile of m(i) satisfies
\quad Q_p[m] = b + a \cdot Q_p\!\left[\frac{1}{X}\right]When averaged across the interquartile range, the shift reduces to a/H. The conventional arm’s length range [Q₁, Q₃] is therefore shifted entirely away from b by this amount. A tested party whose true PLI equals b could be found outside the range simply because the comparables happen to be smaller (or larger) firms. This is not an economic finding—it is a statistical artefact.
3. The Correct Three Steps Procedure
The following procedure eliminates the contamination and produces a benchmark that is both statistically rigorous and economically interpretable.
Step 1 — Test Whether the Intercept is Statistically Nonzero
Before concluding that the bias is material, test the null hypothesis that a = 0 using the standard OLS t-statistic:
\quad t = \frac{\hat{a}}{\mathrm{SE}(\hat{a})} \sim t(N - 2)If H₀: a = 0 cannot be rejected at a chosen significance level, the proportional model Y(i) = b·X(i) is consistent with the data. In that special case the contamination term vanishes and ratio-based analysis is approximately unbiased. Proceed to the ratio range only in this case.
If H₀ is rejected—as it typically is with corporate financial data—the intercept is statistically significant and the contamination is real. Move to Step 2.
Step 2 — Use the Regression Benchmark, Not the Ratio Quartile
For a tested party with base K* (e.g., its net sales), the arm’s length benchmark for its profit is the OLS fitted value:
\quad \hat{Y}^{*} = \hat{a} + \hat{b} \cdot K^{*}This single point estimate uses both the intercept and the slope and is, by the Gauss–Markov theorem, the minimum-variance linear unbiased estimate of the expected arm’s length profit at K*. It replaces the median of the ratio distribution.
Step 3 — Construct the Arm’s Length Range as a Regression Interval
There are two standard intervals, each appropriate for a different question:
Confidence Interval (for the expected arm’s length profit coefficient):
\quad \hat{Y}^{*} \pm t_{\alpha/2}(N-2) \cdot \mathrm{SE}(\hat{Y}^{*})where SE(Ŷ*)² = s² · xᵀ(XᵀX)⁻¹x, with x = [1, K*]ᵀ and s² the OLS residual variance.
Prediction Interval (for the profit level of a single new entity):
\quad \hat{Y}^{*} \pm t_{\alpha/2}(N-2) \cdot s \cdot \sqrt{1 + x^{\top}(X^{\top}X)^{-1}x}Wider than the confidence interval; accounts for both estimation error and residual variance.
In practice, the confidence interval for the regression coefficients is the more appropriate choice for transfer pricing, since the question is whether a specific tested party’s profit indicator falls within the arm’s length range—not whether it equals the expected value for a firm of its size. Both intervals correctly propagate the nonzero intercept and require no modification when a ≠ 0.
4. Summary: Ratio Quartiles vs. Regression Intervals
| Property | Ratio Quartile Range | Regression Interval |
| Accounts for intercept a | No — bias = a/H | Yes — by construction |
| Scale-invariant benchmark | No — shifts with firm size | Yes — Ŷ* adjusts for K* |
| Statistical inference | None (percentile-based) | t-intervals, known coverage |
| Requires a = 0 to be valid | Yes | No |
| Recommended first step | Test H₀: a = 0 first | Proceed after rejecting H₀ |
5. Practical Implications
The framework above has three direct implications for transfer pricing practice:
- Always run the OLS regression of Y on X before constructing the arm’s length range. The t-test on the intercept takes seconds and determines whether the ratio approach is even permissible.
- When the intercept is significant, discard the ratio range entirely. The regression-fitted value Ŷ* = â + b̂K* is both statistically unbiased and economically interpretable as the arm’s length profit for a firm of the tested party’s scale.
- Report confidence intervals, not quartile ranges. They have known coverage probability, account for estimation error in both parameters, and are invariant to the size distribution of the comparables sample.
These recommendations do not require more data—the same comparables used to compute the ratio interquartile range are used to estimate the regression coefficients. They do require that practitioners move from a purely percentile-based framework to a model-based approach, which is already the standard in every other domain of applied econometrics.
I appreciate the comments of Jon Breslaw and Florian Semani — both are free of any errors.