Lesson 32:  Data Sets and Subscripts

<<Previous|Contents|Next>>

Video:  Using SAS Datasets in IML

IML can get information from SAS data sets.  IML does not automatically know what data sets are available, say in the work directory.  So it must be told what datasets to use.  There is a command "show datasets;" which tells which data sets are currently available.  A command like

    use one;

makes the dataset "one" available  to IML for input.  There are several different modes. Only one data set can be open for input and one open for output at a time.  The information on this is found in the documentation under "Using SAS Data Sets."  To get information from a data set, use a command like

    read all into X;

which will get all observations and all variables from the current input data set.  Now, there are two types of matrices in SAS, just like variables--character and numeric.  If your data set has both kinds, you cannot read everything into one matrix.  So, you have to specify which variables to get.  For example,

    read all var (x, y, z) into X;

will read the three given variables.  There can also be a where clause after the into clause to restrict the observations that are included.

Video:  The Reset Command and Printall Option

IML has a command that is similar to the options statement in SAS.  It is "reset."  One of the options available is "printall", which causes IML to print the results of all assignment statements, including intermediate steps, automatically.  This is useful for debugging, but should not be used for "production" programs because it will produce too much output.  To turn on the option, say

    reset printall;

and to change it back to normal, put

    reset noprint;

Video:  Rank of a Matrix

Video:  Subscripting

There is a method called "subscripting" for getting parts of matrices, or submatrices.  The notation uses square brackets after a matrix name, and has two arguments separated by a comma.  For example,

    X1=X[2,2];

will assign the value of the element in the second row and second column to the variable X1 (a scalar).  In order to get something more than a scalar, you have to give a range of elements.  A range is specified using a colon between two numbers, like 1:3, which means everything from one to three.  So,

    X2=X[1:2,2:3];

will assign X2 as a square, two-by-two matrix from the first two rows and the second and third columns.  These ranges can also be assigned to variables.  In other words,

    a=1:3;

is a valid assignment statement that makes a stand for the range of one to three.  Then,

    X3=X[a,a];

will give you a three-by-three matrix from the first three rows and first three columns. 

We need to understand some terminology.  In IML, a "literal" is an actual number or character value for a matrix element.  It is the literal value.  An "expression" is something like an algebraic expression that we expect IML to evaluate to a literal value, like using the range expressions in the X2 assignment above.  A "variable," of course, is a name that stands for some other value.  In the assignment of X3 we used variables.  So we need to keep these terms straight, especially when reading the documentation.

There are a couple of shortcuts used with subscripting.  The first is to leave out one or both arguments.  For example,

    X4=X[,a];

will bring in all the rows of X and the first through third columns.  Put a in the first position and leave out the second argument, and you will get the first three rows, and all the columns.  (You can also put a "*" instead of leaving the argument blank).   The other shortcut is to put in a plus sign.  This causes the rows or columns to sum up.  In other words,

    X5=X[,+];

will give you all the rows, well, that is to say, there will be as many rows as the X matrix has, and each row will be a sum of the columns.  You can reverse it, like

    X6=X[+,];

and get a column that consists of the sums of the rows.  Put in both pluses and you will get the sum of the whole matrix, but that can also be done as

    X7=sum(X);

using the sum function. 

Video:  Using nrow and ncol

Two more useful functions are nrow and ncol. These return the number of rows and columns of a matrix.  This is especially important when loading a matrix from a data set and you don't know ahead of time how many observations there will be.  For example,

    X8=X[+,]/nrow(X);

gives you the average of each row in a column vector.

Now, a word about using other procedures or data steps in a program that uses IML.  The matrices that you create and calculate in IML are not saved when IML exits.  Thus if you run a data step or another proc, then return to IML, you will lose all the matrices. 

Video:  Solving a System of Equations

Here is an example of using IML to solve a system of equations.  Suppose you have the equations 3x+4y-2z=6, x-2y+4z=10, and 2x+3y+z=5.  We translate this into matrices by putting the coefficients into a matrix, say A, like

    A={3 4 -2, 1 2 4, 2 3 1};

and the constants into a vector, say Y,

    Y={6, 10, 5};

The solutions are represented by the X vector.  The system of equations is then represented in matrix notation as  AX=Y.  We premultiply by A inverse, so the solution is X=A-1Y.  We program this in SAS as

    X=inv(A)*Y;

Video:  Hints for the Exercise and some more about functions

For help with the exercise below, in the documentation, look at the section called "Working with Matrices."  Then, under "Using Assignment Statements," find "Matrix Generating Functions."  Here you can see how to use the J matrix function to create a new matrix. 

<<Previous|Contents|Next>>

Exercises:

1.  Use the data below.  In proc IML, program a two-sample t-test for this data.  Assume the number of observations is unknown in advance, but the same for both columns, with no missing values.  To calculate variances, use the J function to subtract the mean from every observation, and remember that a sum of squares can be calculated as e`*e.  Use the formula for equal sample sizes (unequal variance) found, for example, on Wikipedia.  To find the p-value, use the probt function, with the test statistic and df as arguments, then multiply by 2 (the function gives one tail).

data one;
input x1 x2;
cards;
1 2 
4 5 
7 8 
3 8
9 7
6 3
;

Copyright reserved by Dr.  Dwight Galster, 2006.  Please request permission for reprints (other than for personal use) from dwight.galster@sdstate.edu  .  "SAS" is a registered trade name of SAS Institute, Cary North Carolina.  All other trade names mentioned are the property of their respective owners.  This document is a work in progress and comments are welcome.  Please send an email if you find it useful or if your site links to it.