Lesson 36:  First Simulation

Monte Carlo simulations are done by generating data from a theoretical distribution, while bootstrap methods use actual data but resample from the existing data to calculate various statistics.  Both methods work by generating thousands of samples and then studying the statistics that come from the sample.

Starting with a monte carlo simulation:  We set up some parameters for the simulation.

proc iml;
sims=10;  *this is for the outer loop that says how many times the simulation is repeated;
ssize=10;  *this is the size of each simulated sample;
mean=20;
std=3;       *these are parameters for a normal distribution which we are going to simulate.

Next, we set up the basic loops.  We will have an outer loop that repeats the samples, and an inner loop that takes one sample of size ssize.

do i=1 to sims;
do k=1 to ssize;
end;
end; *sims;

My sample is going to be stored in a vector called sample, so with each simulation loop, we need to clear out the old sample, so add a free command just inside the simulation loop.  Normal random variables are generated by taking the standard normal random function, normal(seed), multiplying it by the desired standard deviation, and then adding the desired mean.  Here we create a sample by using a loop which appends a new random number to the vector with each iteration.  It is not necessary to initialize the matrix.

do i=1 to sims;
free sample; *clear sample vector before starting sample procedure;
do k=1 to ssize;
sample=sample//(normal(0)*std+mean);
end;
print sample; *for test only, remove later;
end; *sims;

Let's study the distribution of x-bar (the sample mean).  That is, if you do repeated sampling, what is the average (mean) of all the x-bars, and what is the standard deviation (or variance) of the x-bars?  This is a basic question in the study of statistics.  If you have a statistic, that is, a number calculated from a sample, usually intended to estimate something, what are its properties?  More specifically, what is its distribution?  Is it an unbiased estimate of the parameter you want to estimate?  Does it have a small enough variance to be useful?  Is its variance smaller than other candidate estimators?  Sometimes there is nice theory to answer these questions, but in other cases we either don't have a solution to the needed equations, or we do not have data that meets the assumptions of the theory.  This is when simulation comes in handy.

What we want to do now is calculate xbar for each sample, then accumulate them in a vector.  There will be an xbar for each sample (each loop of sims).  Then we need to calculate the mean of all the xbars and the standard deviation of all the xbars.  For the standard deviation, we will implement the formula

free xbar;                *clear xbar vector before running simulation again;
do i=1 to sims;
free sample;            *clear sample vector before starting sample procedure;
do k=1 to ssize;
sample=sample//(normal(0)*std+mean);
end;
xbar=xbar//sample[:,];
end;                       *sims;
print xbar;              *for test only, remove later;
meanxbar=xbar[:,];
sxbar=sqrt((xbar[##,]-sims*meanxbar**2)/(sims-1));
print meanxbar sxbar;
truesxbar=std/sqrt(ssize);    *theoretical value of sxbar;
print mean truesxbar;

A better way to do the sampling is to use the matrix version of the normal function.  You can take out the sampling loop and replace it with simpler commands as follows.  The "free sample" command is also not needed this way.

free xbar;                *clear xbar vector;
do i=1 to sims;
sample=J(ssize,1,0);
sample=normal(sample)*std+mean;
xbar=xbar//sample[:,];
end;                       *sims;
meanxbar=xbar[:,];
sxbar=sqrt((xbar[##,]-sims*meanxbar**2)/(sims-1));
print meanxbar sxbar;
truesxbar=std/sqrt(ssize);    *theoretical value of sxbar;
print mean truesxbar;

Now change the sims to 10,000 and see what results you get.

Using Simulation to evaluate alpha and beta (significance and power)

Alpha=P(Type I Error)=P(Reject Ho|Ho is True)
Beta=P(Type II Error)=P(Do not Reject Ho|Ho is False)
Power=1-Beta=P(Reject Ho|Ho is False)

Exercise

Just as x-bar is a statistic which has a distribution that can be studied by simulating repeated samples, the sample standard deviation (s) is also a statistic, and so is the sample variance (s-squared).  If you check an intro stat book, you will find that the distribution of s-squared has something to do with a chi-square distribution.

Using similar procedures to those outlined in this lesson, find the mean and variance of the distribution of s-squared when the population is normal and compare them to their theoretical values.  Vary the simulated sample sizes to see the effect that this has.

Copyright reserved by Dr.  Dwight Galster, 2006.  Please request permission for reprints (other than for personal use) from dwight.galster@sdstate.edu  .  "SAS" is a registered trade name of SAS Institute, Cary North Carolina.  All other trade names mentioned are the property of their respective owners.  This document is a work in progress and comments are welcome.  Please send an email if you find it useful or if your site links to it.