What is the meaning of the normal distribution?

(This was originally an answer to a Quora question.)

Ah, the question of meaning. I’m going to assume you are not asking about the mathematical-theoretical derivation of the Gaussian distribution, which is often called the “normal distribution.” This is something you can read about in books of Mathematical Statistics, if you wish. Rather, you want to know what it means, practically, for something to be “normal.” I’m going to assume you’ve already looked at graphs of a normal distribution, and have a general idea what it is.

And so, I think I should start by saying that “almost nothing is actually normal.” One goal of Statistics is to make models of reality that are useful for making predictions. Predictions are (mostly) results of mathematical calculations, so, the easier the mathematics, the easier it is to develop predictions, especially predictions that are reproducible, and understandable, so they can be evaluated and trusted by others. So what is a model? You can think of it as a simplified, or smoothed-over picture of reality that hides some of the bumps and bruises so we can see what is really important. For example, an architect might make a scale model of a building he is designing. The scale model does not have all of the features of the final product. But it can highlight the most obvious features, and it also makes the image small enough so that a person trying to evaluate it can see everything at once. A mathematical or statistical model does something similar for numbers.

So, what statisticians like to do, is take a collection of numbers (there could be millions or billions) and try to boil them down to a simple model. This model would be an equation, or a small group of numbers, that the statistician can look at and “see everything at once,” or at least what is important for the problem at hand. The model will not fit perfectly; that is, it doesn’t have all the details, just like the scale model of the building doesn’t have light switches, water lines, or wireless routers.

When we look at data, the basic form of description is the “distribution.” Imagine one of those coin-sorting machines where you can drop a handful of coins in the top and they fall into vertical tubes with each tube having all the coins of one denomination. When you drop your coins, some of the tubes might get fuller than others, and (ignoring thickness), you can see which coins you have more of. If you count the coins in each tube, that is your “distribution.” If you calculate the proportion of coins in each tube, you have a “relative frequency distribution.” And, if you select a coin at random from the original handful, the relative frequency corresponds to the probability of getting one of those coins. So distributions really describe our data in terms of how many of each kind we have, relative to others. But, due to the realities of measurement, and also some mathematical reasons, having discrete numbers (separate, distinct tubes) is not always the best way to describe the numbers. So, for this we develop the idea of a continuous distribution, where the numbers can have any decimal value. The easiest way to visualize what this is, is to think of the bars of a relative frequency distribution, and connect the tops with a smooth curve, in effect, “smoothing out” the differences between the distinct values.

For certain kinds of data, the “normal distribution” is one such continuous distribution that provides a smooth approximation to the relative frequencies of the real data. But, it is a very complicated formula. It is nothing that would be guessed by trial and error. It’s not the kind of thing you can understand if you have only gone as far as College Algebra in your mathematical pursuits. If we are to avoid the theoretical complexities, the best thing we can say about this formula is that it has a bunch of really useful mathematical properties which work very well in statistical theorems about estimation and hypothesis testing. If we can apply this model to the data, we are able to draw conclusions that have mathematical validity. Today, there are other methods (non-parametric, etc) that use a lot of computing power to draw similar conclusions without the constraints of using an approximating model that discards some of the information. Yet, these methods may also have drawbacks, particularly in the areas mentioned above, namely, whether they are “reproducible, and understandable, so they can be evaluated and trusted by others.”

Remembering that the normal distribution is a model, so it is not expected to describe the data perfectly, we can understand the common procedure where we produce a histogram (relative frequency distribution graph) and superimpose a normal curve on it to see “how well it fits.” There are tests to determine if the fit is “good enough” for the normal distribution to be used. But in general, the distribution should be symmetric and bell-shaped. In theory, normal distributions go on infinitely in positive and negative tails, but it is not necessary (and not feasible) for this to happen with the real values.

How do you “fit” a normal distribution? Only two parameters are needed, the mean, and the standard deviation. You can calculate these from the data you have. The standard deviation describes how much the data “spreads out.” If this explanation is not satisfying, try this visualization: Suppose you see a cloud of fruit flies buzzing around a rotten apple. take a photo, and measure all the distances of the flies from the center of the cloud. You could calculate the average distance, which is called the “mean absolute value,” which would be one way of describing how much the flies spread out. The standard deviation is like that, but it is the average of the squared distances, which give more influence to the farther-out values. Now imagine a flock of crows circling around a tasty cornfield. You are at a distance that makes the cloud of birds look very much like the fruit flies you saw before. Suppose you photograph this scene, and by some wild coincidence, all the birds line up exactly with the fruit flies in your previous photo! But the birds are much bigger, and much farther apart, in real life. So, the standard deviation will be much larger. So what this tells you is that the standard deviation is a kind of scaling factor. The two photos can be “standardized” so that they look the same, by changing the measurement units. And although the photos were taken miles apart, you can slide them together on the table, so that the centers are right on top of each other, and you can’t really tell the difference between them. This is exactly what the “standard normal” distribution does. We slide the number line under the data until the mean is zero, and then stretch or squeeze it until the standard deviation is one. In this sense, all normally distributed data looks the same, and since we know the changes we made (subtract the mean and divide by the standard deviation) we can convert any conclusions we made back to the original location and scale. This explanation of “standard normal” was a bit of a tangent. The real point is that normally distributed data are completely described by the mean and standard deviation. By modeling with the normal distribution, we have essentially collapsed all the data down to two numbers. How’s that for simplification?

The fruit flies, or the birds, exist in three dimensions in real life, and in two dimensions in the photo. So our visualization can be misleading if we take it too far—because what we really need to think about is only one dimension. The flies represent data, but to make this work, we really have to think of them as representing a number to the right or left of the mean. We should then drop them down to the x-axis, so that they pile up like a histogram. Now if you have a good imagination, you may realize that if there are not too many files, they will all fall down flat on the x-axis and not pile up at all! Not much of a bell curve! But this is where the mathematical magic of continuous distributions comes in. It is a fact that the probability of any specific number (infinite decimals) is zero in a continuous distribution. But using the techniques of calculus, ranges of numbers have positive probabilities. So even if the flies all fell flat, those closer to the mean are more dense, and thus for any range, the probability is higher than the same range farther away from the mean.

Now let’s get back to that question about the meaning of the normal distribution. Remember that the standard deviation is a kind of distance from the mean. With normally distributed data, approximately 68% of observations are within one standard deviation of the mean, and approximately 95% are within two standard deviations. Beyond that, there are very few—99.7% are within three standard deviations, and the probability beyond four is so small that for all practical purposes we can say that all the data are within four standard deviations. You’ve probably heard all this already, and there’s probably a big old “so what” sitting on your shoulder, which might be the reason you asked this question in the first place. And the truth is, the “meaning” you may be looking for, often has a great deal to do with what exactly the data is about. The normal distribution is telling you things like this:

  1. Our data exhibit variation that is centered around a mean
  2. There are more data close to the mean and increasingly less as the distance from the mean increases.
  3. Data more than two standard deviations away are rare, and data more than three standard deviations away are almost non-existent.

Interpretations in real life go something like this: The average human IQ is supposed to be 100, and the standard deviation is 15. This means most people (68%) have IQ’s between 85 and 115, which is roughly considered “normal.” For simplicity, let’s just look at the top half of the distribution. From 100 to 115, one standard deviation, you would expect to find 34% of the people. From 115 to 130, the second standard deviation, only 13.5%. From 130 to 145, the third standard deviation, only 2.5%. And over 145? Well that’s just a fraction of a percent (0.15%). If the model holds true for your school, and there were 2000 students in your school, just three of them would be in this group. Now you can understand, if you have heard talk about someone (Einstein) having an IQ over 160, how rare this is. 160 is the fourth standard deviation from the mean. We expect to find almost no data over that level.

In summary, what is the meaning of the normal distribution? It is a way of modeling the probabilities for certain kinds of data, that allows us to describe, in a simplified way, how the data are distributed, using a very simple description (2 parameters). Many real data fit the distribution quite well, so it is a widely used method of summarizing data for analysis.