Interquartile Range is an Unreliable Measure of the Arm's Length Range

Always think twice.

Michael Jackson’s “Billie Jean” Lyrics

The interquartile range (IQR) of comparable profit indicators is like weed in the garden. The U.S. 26 CFR § 1.482-1(e)(2)(iii)(B) [transfer pricing regulations] provides that “the reliability of the analysis must be increased, where it is possible to do so, by adjusting the range [IQR is not specified] through application of a valid statistical method to the results of all of the uncontrolled comparables so selected.” (Italics added).

Further, “the reliability of the analysis is increased when statistical methods are used to establish a range of results [IQR is not specified] in which the limits of the range will be determined such that there is a 75 percent probability of a result falling above the lower end of the range and a 75 percent probability of a result falling below the upper end of the range.” (Italics added).

The regulatory provision above points to applying a 50% confidence interval in which the critical t-multiplier ≈ 0.6745. The expression “probable error” of the mean = 0.6745 SQRT(σ² / N), where the sample standard deviation is substituted for the unknown parameter σ². See Paul Hoel, Introduction to Mathematical Statistics (3rd edition), John Wiley & Sons, 1962, pp. 141−142; or John Taylor, Introduction to Error Analysis (2nd edition), University Science Books, 1982, p. 137. See also Harold Davis, Elements of Statistics (with applications to economics) (2nd edition), Principia Press, 1937, pp. 190-191 (“Since 0.6745 is approximately 2/3, the probable error is often conveniently written (2/3) σ.”). See more references below.

Furthermore: “The interquartile range ordinarily provides an acceptable measure of this range; however, a different statistical method may be applied if it provides a more reliable measure.” (Italics added).

Quartiles are the most elementary form of univariate (single variable) data summary because no statistical technique is employed beyond sorting and slicing (tagging) of the data. To compute quartiles, including Q1, Q2 (median), and Q3, the dataset is sorted from low to high and then sliced in half, which becomes the median. Next, the bottom half of the data is sliced in half to tag Q1, and the top half is sliced in half to tag Q3. Quartiles are a primitive (elemental) data summary measure in which no analytical algorithm beyond sorting is employed, and 50% of the data is discarded.

The median (50th percentile of the data distribution) has a higher standard error than the arithmetic mean. If the data distribution is normal (bell shaped), the standard error of the median is 1.57 times higher than the standard error of the arithmetic mean.^[1]

Data distributions of comparable profit indicators in many U.S. industries are skewed, and increasing sample size does not produce bell-shaped data distributions. Distributions of profit indicators have long tails, which suggest that the median is a good measure for only 50% of the dataset.^[2] As a result, important information may be excluded from such draconian data trimming to produce quartiles.

Appeals to using quartiles are naive, and insistence on using quartiles without demonstrating that it is the most reliable measure of the arm’s length range (under the facts and circumstances) disregards statistical principles and regulatory prescription.

Quartiles can be used if the reliability of the IQR is higher than applying other valid statistical methods, such as Tukey’s notches, confidence interval for the mean, or regression analysis. The referenced regulations should be followed using “valid statistical methods” to produce the most reliable measure of the arm’s length profit indicator. Blind use of quartiles to determine a wide (unreliable) range of the profit indicator is a poor (unacceptable) transfer pricing practice.

References

[1] In statistics and data analysis, “the best value is (i) a consistent or unbiased linear combination of the observations and (ii) has minimum variance.” Alexander Aitken, Statistical Mathematics (8th edition), Oliver & Boyd, 1957, p. 107. Original italics. “An estimator is said to be more precise as its standard deviation is smaller.” Adrian van den Bos, Parameter Estimation for Scientists and Engineers, John Wiley & Sons, 2007, p. 47.

Physics professors Bevington & Robinson: “It is reasonable to expect that the most reliable results we can calculate from a given set of data will be those for which the estimated errors are the smallest.” Philip Bevington & Keith Robinson, Data Reduction and Error Analysis for the Physical Sciences (2nd edition), McGraw-Hill, 1992, p. 6, emphases added. See p. 32 regarding the 50% tolerance interval (“probable error”) given by the factor 0.6745σ.

It is desirable to have parameter estimates concentrated around the true value, that is, to have small variances. This attribute is called efficiency, precision, or reliability.

A useful relative measure of efficiency (or reliability) of two unbiased parameter estimates is the ratio = Var(Median) / Var(Average). Given sample size (N), Var(Median) = (π / 2) (σ² / N) and Var(Average) = (σ² / N). The substitution of these two variances in the reliability (efficiency) ratio above yields (π / 2) (σ² / N) / (σ² / N) = (π / 2) ≈ 1.57, because the number π ≈ 3.14. See Thomas Wonnacott & Ronald Wonnacott, Introductory Statistics, John Wiley & Sons, 1969, pp. 136-137.

The number pi (π) is the ratio of the circumference (perimeter, hence pi) to the diameter of any circle. See the book by Peter Beckmann, A History of Pi, St. Martin’s Press, 1971.

“We usually compare unbiased estimators in terms of their variances. If θ̂₁ and θ̂₂ are both unbiased estimators of a parameter φ and the variance of θ̂₁ is less than that of θ̂₂, we say that θ̂₁ is relatively more efficient. In fact, we use the ratio var(θ̂₂) / var(θ̂₁) as a measure of the relative efficiency of θ̂₁with respect to θ̂₂.” John Freund, Mathematical Statistics (2nd edition), Prentice-Hall, pp. 259-260, original italics.

Apropos: “We may define the true value of a physical quantity as the limit to which the mean of N observations tends when N increases indefinitely.” Edmund Whittaker and G. Robinson, The Calculus of Observations (A Treatise on Numerical Mathematics) (4th edition), Blackie & Son, 1944, p. 215 (original italics).

The desirable properties of point estimates (unbiased, minimum variance, consistency) are covered in Arthur Goldberger, Econometric Theory, John Wiley & Sons, 1964, pp. 125−128.

Jack Johnston, Econometrics Methods (3rd edition), 1984, p. 31, shows the efficiency (reliability) of one regression coefficient estimate over its rival estimate: “If we form the ratio var(b) / var(b’) = 0.0125 / 0.0156, we have the efficiency of b’ relative to b so that, by the least-squares criterion, b’ is 80 percent efficient.” (Original italics).

[2] Nassim Taleb, Statistical Consequences of Fat Tails, Stem Academic Press, 2020.