First, some context:

This article applies to SSOE (Statistical Sampling and Overpayment Extrapolation) as practiced in the Medicare program. The principle is more generally applicable, but this is the specific use case.

The “Independence Fallacy” arises when a sampling plan is challenged (generally during an overpayment appeal) on the grounds that sample units are not “independent,” and thus the estimation methods employed are claimed to be invalid.

Second, what is the importance of independent sample units?

Most estimation procedures rely on certain theorems in statistics that require normal distributions and “independent, identically distributed” (*iid*) sample units. In particular, the challenge will usually relate to the use of the “Central Limit Theorem” (but there are actually several “central limit theorems) which says that under certain conditions, the estimate will follow a normal distribution (approximately). Independence is supposed to be one of the conditions required.

It is worthwhile to note that this condition of independence contrasts with what happens in something called a “Markov Chain,” in which each sample unit is a statistical function of the preceding one. More precisely, the probability distribution of each sample point depends on what has been observed before. Suppose you measure the speed of a bus at certain points between two stops. Since the bus certainly goes through a pattern of acceleration and deceleration (though affected by traffic), the probability distribution of the speed at the next measurement point depends on the speed at the last measurement point. In a scenario with independence, the observed value (or distribution) of any sample point has no relationship with the previous sample unit. All sample units have the same distribution.

The claim that is made in overpayment cases is that sample points (which are individual overpayments) are not independent because some come from the same patient, or the same doctor, or the same day, etc. In fact, such observations *may* be *correlated* within the population. This correlation can occur if for some reason, all of one patient’s overpayments, on average, are greater than another patient’s overpayments, due to some special feature of the patient. In this case the *probability* distribution of the two patients’ overpayments, *prior to payment or even billing*, is different. However, *correlation* does not equal *dependence*!

And now, things are going to get really technical. We need to understand what the random variable actually is in this scenario. The overpayments in the universe are not random variables. In fact, they are fixed, though unknown, values.* This is the reason for the phrase “prior to payment or even billing” in the previous paragraph. Once payment is made, the overpayments become fixed, not random. They do not have probability distributions. (There is a distribution of values, but that is not the same thing.) The probability distribution only comes into play through *sampling*.

“So,” you may be asking, “how do we get a probability distribution out of fixed values?” Why, the same way that we get it from tossing a die. The six numbers on a die are fixed. What has probability is not the actual number itself (or face of the die) but the outcome of a random experiment–tossing the die. And in the case of tossing a fair die, each number or face has a 1/6th probability of being “chosen.” If you toss the die six times, there is no dependence from one toss to the next. Now suppose you toss six separate fair dice, one at a time. Again, there is no dependence between tosses. The second example does not have greater *independence* because it involves six different dice.

And so, the overpayments in the population are like the faces of the die. They are fixed values with no probability distribution. If all were reviewed, the result would be an exact number with no probability or statistics involved. However, we do not review them all. Instead, we create a random process (toss the die) in which several sample units are selected in sequence. The values of the sample units are not known in advance, because the identity of the units (in the population) is not determined until the sampling procedure is carried out. The order of the selected values is also not pre-determined. This is why the sample units have an *identical distribution*, even though there can be widely varying values in the population. Any of the population elements can end up in any of the sample unit positions. This means the *probability distribution* of each sample unit is the same as the *fixed relative distribution* (unknown) of the population, and it is the same for every sample unit. Therefore, no sample unit’s distribution depends on the previous sample unit, and they are *independent*. Any sharing of characteristics relating to the origin of the sample unit is completely irrelevant.

In conclusion, the idea that sample units in overpayment extrapolations might not be independent is a *fallacy*, with no possible basis in statistical theory.

*An objection may be raised here, that different reviewers might determine different overpayments. This is a separate issue, involving “measurement error” and “bias.” These issues are addressed in the appeals process by re-reviewing claims in question, and do not affect the statistical theory. Regardless of what different people may decide, there is a “truth” about each overpayment which the review process is intended to uncover.