1. Introduction
In transfer-pricing analysis and corporate-profit studies, the objective of regression analysis is estimating a structurally meaningful slope coefficient rather than predicting the dependent variable. This distinction is not merely one of emphasis: it has binding implications for how rival specifications are selected and evaluated.
A widely used structural equation at EdgarStat takes the form
REVT(i) = a + b\,XOPR(i) + u(i)
where REVT is total revenue, XOPR is total operating cost, and b is the long-run markup coefficient. The intercept absorbs fixed-cost heterogeneity, scale effects, and accounting conventions across comparables. The central parameter of economic interest is b, not the predicted value of REVT.
Existing TNMM and CPM benchmark practice often bypasses the regression entirely and computes the operating-margin ratio m(i) = \frac{Y(i)}{X(i)} directly. Section 5 demonstrates that this practice imports a structural bias equal to a/H into every reported quartile of the arm’s-length range, where H is the harmonic mean of the cost base X. The bias is not a sampling artifact: it is a theorem that follows from the linear structural equation by substitution.
2. Three Rivals as Members of the Box-Cox Family
Box and Cox (1964) proposed a parametric family of power transformations that nests all three rival specifications as special cases. Define the transformed variable
Y^{(\lambda)}(i)
=
\begin{cases}
\dfrac{Y(i)^{\lambda}-1}{\lambda},
& \lambda \neq 0, \\[6pt]
\ln\!\bigl(Y(i)\bigr),
& \lambda = 0.
\end{cases}
and analogously for X(i). The Box-Cox regression model is
Y^{(\lambda)}(i)
=
a
+
bX^{(\lambda)}(i)
+
e(i)where e(i) is assumed normally distributed with mean zero and constant variance. The three rival specifications correspond to three values of the transformation parameter lambda:
- \lambda = 1: The model reduces to the linear specification Y(i) = a + b X(i) + e(i). The slope b measures the marginal change in Y per unit of X, interpretable directly as a markup or pass-through coefficient.
- \lambda = 0: The model becomes \ln(Y(i)) = a + B \ln(X(i)) + e(i). The slope B is an elasticity measuring proportional change in Y per proportional change in X. It is dimensionless and answers a different economic question than a markup.
- \lambda = 1/2: The model becomes \sqrt{Y(i)} = a + d \sqrt{X(i)} + e(i). Differentiating, \frac{dY}{dX} = d \sqrt{\frac{Y(i)}{X(i)}}, which varies across firms with their individual Y/X ratios. The coefficient d has no direct interpretation as a markup or elasticity.
Three consequences follow immediately.
First, the slope parameters b, B, and d are not commensurable. They measure different economic objects in different units. Ranking the models by comparing magnitudes or t-ratios of the respective slopes is not meaningful.
Second, the models are not merely alternative statistical approximations to the same relationship. Each imposes a specific functional form. If the true relationship has a stable markup structure — meaning \frac{dY}{dX} is approximately constant across firms — only the linear specification (\lambda = 1) estimates that quantity directly.
Third, the log-log specification introduces a further pathology even under its own assumptions. Suppose the true relationship is linear: Y(i) = a + b X(i) + e(i). The implied log-log slope that the data would generate is
\frac{d\ln(Y(i))}
{d\ln(X(i))}
=
\left(
\frac{dY(i)}
{dX(i)}
\right)
\left(
\frac{X(i)}
{Y(i)}
\right)
=
\frac{
bX(i)
}{
a+bX(i)+e(i)
}This quantity varies with X(i) across the sample. Fitting a log-log model to a linear relationship therefore does not estimate a single structural markup: it estimates a firm-varying quantity that tracks operating size rather than the economic mechanism of interest.
3. Box-Cox Maximum Likelihood and Hypothesis Testing
3.1 The Log-Likelihood
The concentrated log-likelihood for the transformation parameter lambda, after maximizing over a, b, and the error variance, is
L(\lambda)
=
-\frac{N}{2}
\ln
\left[
\frac{\mathrm{RSS}(\lambda)}{N}
\right]
+
(\lambda-1)
\sum_i
\ln(Y(i))where RSS(lambda) is the residual sum of squares from regressing Y(\lambda)(i) on X(\lambda)(i). The second term is the Jacobian of the transformation. It adjusts the log-likelihood to place all values of lambda on a comparable scale, making the likelihoods at \lambda = 0, 1/2, and 1 directly comparable. Without this Jacobian term, comparing likelihoods across different lambda values is meaningless because the transformed variables Y(\lambda)(i) have different scales.
The maximum-likelihood estimator lambda-hat is obtained by grid search or numerical optimization of L(lambda). A 95-percent confidence interval for lambda is the set of values satisfying
L(\hat{\lambda})
-
L(\lambda)
\le
\frac{1}{2}
\chi^2_{1,0.95}
=
1.923.2 Hypothesis Tests on Lambda
The three rival specifications correspond to testable restrictions on lambda. Each is tested by the likelihood-ratio statistic
LR
=
2
\left[
L(\hat{\lambda})
-
L(\lambda_0)
\right]
\sim
\chi^2(1)where \lambda_0 is the hypothesized value from the set {0, 1/2, 1}. The test is one degree of freedom regardless of sample size. If the linear restriction \lambda = 1 is not rejected, the linear model is the data-preferred specification and b is directly interpretable as a markup. If it is rejected, the practitioner must address the non-commensurability of the slope at lambda-hat, which requires transforming back to the original scale via a local linear approximation evaluated at the sample mean of X and Y.
Regarding the latter, for \lambda = 0.5, approximating linear relationship has adjusted sloped parameter and associated standard error:
b
\cdot
\sqrt{
\frac{Y^{*}}{X^{*}}
}\mathrm{se}(b)
\cdot
\sqrt{
\frac{Y^{*}}{X^{*}}
}
where X* is the sample mean of X and Y* is the sample mean of Y. Similarly, for \lambda = 0, the approximating linear relationship has adjusted slope parameter and associated standard error:
b
\cdot
\left(
\frac{Y^{*}}{X^{*}}
\right)\mathrm{se}(b)
\cdot
\left(
\frac{Y^{*}}{X^{*}}
\right)3.3 Why Prediction-Error Criteria Fail
If the central goal were prediction, one might rank the three specifications by root mean-squared error or mean absolute error after retransforming to the original Y scale. This approach has three compounding deficiencies.
First, retransforming log-log predictions to the Y scale requires Duan’s (1983) smearing correction. The naive retransformation
\hat{Y}(i)
=
\exp
\left(
\hat{a}
+
\hat{B}\ln(X(i))
\right)
is biased upward whenever the residuals are non-zero. The correct predictor multiplies by the smearing factor
\hat{\Delta}
=
\frac{1}{N}
\sum_{i=1}^{N}
\exp\!\bigl(\hat{e}(i)\bigr)where e-hat(i) are the OLS residuals in log space. Omitting this correction means log-log predictions are systematically overstated, so the prediction-error competition is not conducted on a level playing field.
Second, comparing RMSE across different lambda values is internally inconsistent. The log-likelihood L(\lambda) already corrects for scale differences via the Jacobian; RMSE in original-Y units after retransformation does not apply a consistent correction across lambda.
Third, and most fundamentally, prediction accuracy of Y(i) is not the economic objective. A specification that fits Y(i) more closely by exploiting a non-linear scale compression may estimate a slope coefficient that is economically opaque, unstable, or theoretically incoherent relative to the markup question.
4. Criteria for Specification Selection When the Slope Is Central
When the slope coefficient is the economically meaningful parameter, rival specifications inside the Box-Cox family should be evaluated in the following order of priority.
- Box-Cox maximum likelihood and likelihood-ratio test of \lambda = 1. Estimate lambda-hat and test whether the linear restriction is rejected. If not rejected, the linear model is the data-preferred specification within the family, and b is directly interpretable as a markup.
- Economic interpretability of the slope at the selected lambda. At \lambda = 1 the slope is a markup in units of revenue per unit of cost. At \lambda = 0 the slope is an elasticity. At \lambda = 1/2 the slope has no direct economic interpretation without additional computation. Prefer the specification whose slope is directly interpretable.
- Slope stability across firms, industries, and subperiods. A slope that varies systematically with firm size or industry is not a structural parameter — it is a data artifact of the transformation. Stability should be assessed via Chow tests or rolling subperiod estimates.
- Precision of the slope estimate. Report the slope with its 68-percent confidence interval (b-hat +/- SE(b-hat)), consistent with GUM/ISO metrological practice. A slope estimated with a wide interval is less reliable regardless of the lambda value.
- HAC/Newey-West standard errors for the slope. In cross-sectional comparables datasets, residual heteroskedasticity is expected: the variance of \frac{e(i)}{X(i)} scales with \frac{1}{X(i)^2} under the linear model. Newey-West or White standard errors are required.
- Residual diagnostics. Normality, serial independence, and homoskedasticity of e(i) in the transformed scale. Failure of these conditions at lambda-hat is evidence that the Box-Cox family may be misspecified.
- Parsimony and theoretical coherence. In the absence of strong evidence favoring
\lambda <> 1, the linear model should be retained on grounds of parsimony and interpretability. The burden of proof lies with departures from linearity, not with linearity itself.
5. The Harmonic Mean Bias Theorem
5.1 Setup
TNMM benchmark practice computes the operating-margin ratio m(i)=\frac{Y(i)}{X(i)} and reports the first, second, and third quartiles of the cross-sectional distribution of m(i) as the arm’s-length range. This section demonstrates algebraically that every quartile of m(i) is a biased estimator of the slope coefficient b.
Let the true structural relationship be the linear model
Y(i) = a+bX(i)+u(i), \qquad i=1,\dots,N
where Y(i) is revenue or operating profit, X(i) is the cost base, b is the structural markup, and a captures fixed costs. Define the ratio
m(i)
=
\frac{Y(i)}{X(i)}5.2 The Fundamental Identity
Substituting the structural equation into the definition of m(i):
m(i)
=
\frac{
a+bX(i)+u(i)
}{
X(i)
}
=
b
+
\frac{a}{X(i)}
+
\frac{u(i)}{X(i)}
The term \frac{u(i)}{X(i)} is mean-zero and does not contribute to systematic bias. The signal component is
m(i)
=
b
+
\frac{a}{X(i)}This identity is the engine of the theorem. Every observed ratio decomposes into the structural markup b plus an intercept-driven term \frac{a}{X(i)} that is heterogeneous across firms and does not vanish as N increases. The distortion is structural, not statistical.
5.3 Bias of the Sample Mean
The sample mean of m(i) is
\bar{m}
=
b
+
a
\cdot
\frac{1}{N}
\sum_{i=1}^{N}
\frac{1}{X(i)}
The harmonic mean of X(i) is defined as H = \frac{N}{\sum_{i} \frac{1}{X(i)}}, so \frac{1}{N} \sum_{i} \frac{1}{X(i)} = \frac{1}{H}. Therefore
\bar{m}
=
b
+
\frac{a}{H}The bias of m-bar as an estimator of b is \frac{a}{H}. This bias is structural: it is non-zero whenever a <> 0, and it does not diminish with sample size.
5.4 Bias of the Quartiles
Since m(i) = b + \frac{a}{X(i)}is strictly monotone in X(i) for a <> 0, the ranks of m(i) and X(i) are perfectly tied. Specifically, when a > 0, m(i) is decreasing in X(i), so the p-th order statistic of m corresponds to the (1-p)-th order statistic of X. Therefore, for the empirically common case a > 0,
Q_p(m)
=
b
+
\frac{a}{Q_{1-p}(X)}
The bias of the p-th quantile of m as an estimator of b is
\frac{a}{Q_{1-p}(X)}. Specializing to the three TNMM quartiles:
Q_1(m)
=
b
+
\frac{a}{Q_3(X)}Q_2(m)
=
b
+
\frac{a}{Q_2(X)}Q_3(m)
=
b
+
\frac{a}{Q_1(X)}Each quartile of the arm’s-length range is biased upward relative to b when a > 0. The bias is smallest at Q1(m), which draws on the large-X firms, and largest at Q3(m), which draws on the small-X firms. The arm’s-length range is asymmetrically biased across its own quartiles.
5.5 Inflation of the Interquartile Range
The interquartile range of m is
\mathrm{IQR}(m)
=
Q_3(m)-Q_1(m)
=
a
\left[
\frac{1}{Q_1(X)}
-
\frac{1}{Q_3(X)}
\right]
Rearranging:
\mathrm{IQR}(m)
=
a
\cdot
\frac{
Q_3(X)-Q_1(X)
}{
Q_1(X)Q_3(X)
}Even if the true structural markup b were identical across all comparables — no genuine economic variation in profitability — the reported arm’s-length range would have a positive width equal to \frac{a \cdot \text{IQR}(X)}{Q_1(X) \cdot Q_3(X)}. The width of the arm’s-length range is not a measure of economic dispersion in markups; it is cost-base heterogeneity compounded by the intercept.
5.6 The Suppressed Intercept Restriction
Setting a = 0 — the implicit restriction that ratio-based practice imposes by computing m(i) = \frac{Y(i)}{X(i)} directly — eliminates the bias. But a = 0 is a testable hypothesis. Under the linear model, the t-statistic for H_0: a = 0 is
t
=
\frac{\hat{a}}{\mathrm{SE}(\hat{a})}TNMM practice imposes a = 0 by construction and without acknowledgment. Under Treas. Reg. ¶1.482-1(e), the “most reliable measure” standard requires the method providing the highest degree of comparability. A method that imposes an untested coefficient restriction and reports quartiles carrying a structural bias of a/H cannot satisfy that standard without empirical demonstration that a = 0.
6. EdgarStat Position
When the slope coefficient is the economically meaningful parameter, rival specifications inside the Box-Cox family should not be selected by prediction-error criteria. The correct procedure has four steps.
- Estimate the Box-Cox parameter lambda-hat by maximum likelihood and test the linear restriction lambda = 1 by the likelihood-ratio statistic. Unless the restriction is rejected, retain the linear model.
- Report b-hat with its 68-percent confidence interval (b-hat +/- SE(b-hat)), consistent with GUM/ISO metrological practice. HAC/Newey-West standard errors are required in cross-sectional comparables datasets.
- Do not compute or report the operating-margin ratio m(i) = Y(i)/X(i) without first testing H_0: a = 0. If the null is rejected, the quartiles of m(i) carry a structural bias of a/H and do not constitute an arm’s-length range under Treas. Reg. ¶1.482-1(e).
- If the intercept restriction a = 0 is rejected, report b-hat as the arm’s-length point estimate and b-hat +/- SE(b-hat) as the arm’s-length range. This interval has a direct metrological interpretation absent from ratio-quartile practice.
The three rival specifications are not arbitrary alternatives. They form a one-parameter family with lambda as the selection parameter. The correct question is: what does the data say about lambda? Box-Cox maximum likelihood answers it with a test statistic and a confidence interval. Selecting among the three by prediction error is asking the wrong question with the wrong tool.
References
Aitken, A. C. 1935. “On Least Squares and Linear Combination of Observations.” Proceedings of the Royal Society of Edinburgh 55: 42–48.
Box, George E. P., and David R. Cox. 1964. “An Analysis of Transformations.” Journal of the Royal Statistical Society, Series B 26 (2): 211–252.
Duan, Naihua. 1983. “Smearing Estimate: A Nonparametric Retransformation Method.” Journal of the American Statistical Association 78 (383): 605–610.
Haavelmo, Trygve. 1944. “The Probability Approach in Econometrics.” Econometrica 12 (Supplement): iii–vi, 1–115.
Kalecki, Michał. 1954. Theory of Economic Dynamics. London: Allen & Unwin.
Koopmans, Tjalling C., ed. 1950. Statistical Inference in Dynamic Economic Models. Cowles Commission Monograph 14. New York: Wiley.
JCGM 100:2008 (GUM). Guide to the Expression of Uncertainty in Measurement. Joint Committee for Guides in Metrology.
U.S. Treasury Department. 1994. Section 482 Regulations. 26 C.F.R. ¶1.482-1 through 1.482-9.