Lesson 38:  Power and Regression


Video:  Power Simulation

  Here is our program from last time again.

    proc iml;
    mean=0;  *set the mean to 0 for now;
    alpha=.05;      *provide an alpha value for the decision;

     free xbar;                *clear xbar vector;
     free pvec;                *like xbar, clear before restarting simulation;
    do i=1 to sims;
        smean=sample[:,];                                       *new name for sample mean;
        p=probt(-abs(smean)/sqrt((sample[##,]-ssize*smean**2)/(ssize-1)),(ssize-1))*2;  *calculate p-value;
        pvec=pvec//p;   *accumulate the p-values in a vector;
    end;                       *sims;

    numrej=sum(pvec<alpha);        *summarize number of rejections;
    pctrej=numrej/sims*100;            *and do percents;
    print numrej pctrej;                *and print;

We can try some different sample sizes and see how they affect power.  The larger the sample size, the more likely you are to detect a false null hypothesis.

Now it is nice to have a graph to display power under different conditions.  We can add another outer loop to our simulation so that we can repeat it with different means.  We will keep track of the percent rejects for different means in another matrix and not print them individually, but make a graph.

   proc iml;
    std=3;   *remove mean from this list;
    free xy;    *this matrix stores the result of each simulation;
    do mean=-5 to 5;     *set up a loop that changes the mean;
         free xbar;               
         free pvec;               
        do i=1 to sims;
        end;                       *sims;

        pctrej=numrej/sims*100;   *stop printing this;
        xy=xy//(mean||pctrej);   *create a matrix with columns for the means and pctrejs;
    end;      *end of the loop for means.
    call pgraf(xy);    *graphs a two-column matrix;

Video:  Simulating Regression

Next we will try to simulate regression.  So a data point in regression is assumed to be the result of a linear equation  plus an error term which has a mean of zero and a constant variance.  So to simulate regression data, we need a set of x's together with the parameters (coefficients) in the equation, and also a standard deviation for the errors.  We can then generate y values.

The solution for the parameter estimates (referred to as the b vector) are found by matrix calculations based on a matrix called X which, in the simple linear case, consists of a column of 1's and a column of the x values.  (In multiple regression there are more columns of independent variables.)  The solution is given by b=(X`X)-1X`Y.  So, the procedure here is to first simulate a regression sample, then calculate the b vector, and analyze the distribution of the results.

proc iml;
*simulating regression;
*note that at the beginning of regression we 
 assume the x values are fixed, there is a
 linear relationship between x and the mean 
 of y, and errors have a homogeneous normal 
 distribution with mean 0.;
xvals=(1:ssize)` ;  *this is the fixed set of x;
x=J(ssize,1)||xvals;  *the X matrix in simple linear regression is ;
			*a column of 1s for the intercept and the x values;
xpxi=inv(x`*x);   *"x prime x" matrix;
h=xpxi*x`;	   *useful in later calculations;
free bsave;	   *this matrix will hold the parameter estimates;;
do i=1 to sims;
  y=b0+b1*xvals+errors;   *note that xvals and errors are vectors;
bmean=bsave[:,];   *This calculates means of both columns;
btruemn=b0||b1;    *Want a vector to compare to bmean;
btruecov=std**2*xpxi;    *standard formula for covariance matrix of b;
  *calculate variances and covariances from simulation to compare;
bcov=((bsave[,1]#bsave[,2])[+,]-sims*bmean[,1]*bmean[,2])/(sims-1); *just cov(bo,b1);
bcovar=(bvar[1,1]||bcov)//(bcov||bvar[1,2]);   *variance-covariance matrix;
print bmean btruemn bstd;
print bcovar btruecov;

Video:  Homework Hints



Modify the regression simulation to do a one-way ANOVA with three levels.  There is actually not much to change because the matrix solutions will be the same.  What changes is the X matrix and the way you simulate the data.  You will have three means for three groups or levels.  The distribution of errors is considered the same for all (simplest assumptions again).  Your y vector will then consist of the group means plus the errors.  The X matrix will consist of three columns which are the values of the indicator variables for the groups.  It looks something like

Copyright reserved by Dr.  Dwight Galster, 2006.  Please request permission for reprints (other than for personal use) from dwight.galster@sdstate.edu  .  "SAS" is a registered trade name of SAS Institute, Cary North Carolina.  All other trade names mentioned are the property of their respective owners.  This document is a work in progress and comments are welcome.  Please send an email if you find it useful or if your site links to it.