Lesson 18: Using Arrays to Program With Variables
An array in SAS is a data step programming tool that allows us to reference a series of variables mathematically, most often as part of a loop. The array itself is not part of the resulting data set. It is only a temporary structure used during the data step to manipulate variables. An array essentially assigns an index number to each variable in the array. Then statements in the program can be used to calculate the index number of the variable to use or set its value.
In the following example, the array definition also creates the variables that are referenced by the array. The syntax consists of the "array" keyword, followed by the name of the array (which doubles as the base of the variable names) and then number of variables to create, in parentheses. The variable names will have their index number appended. When used in an array reference, the index (or mathematical expression to generate the index) is enclosed in square brackets.
Notice that we have created just one observation. The do loop is cycling through the variables, rather than through observations. Study the correspondence between the variable names shown in the data set and the values of the index of the array, which was also used to calculate the variable values. An implicit output was used in this example. Since only one observation was created, there was no need to specify an "output."
In the next example, we add another loop in order to create 10 observations. This time an explicit output is needed, just before the end of the outer loop, when the values of an observation have been assigned, and the observation is ready to be saved. We also demonstrate that mathematical expressions can be used to control the index values, and that loop counters can be creatively used in assignment statements as well.
The names of variables in an array are not restricted to this form, though. Nor does the array have to create new variables. In the next example, we use a format statement as the first line in the data step. This creates the three variables, length, width, and height, while at the same time saving a format for them. Next, we define an array called "dims." Instead of giving a number of variables to create, e.g. "(3)," we actually list the names of the variables to be used in the array. The effect of this is that the array reference "dims[1]" is associated with the variable "length," "dims[2]" is associated with "width," and "dims[3]" is associated with "height." In the example, we show that both the actual variable names and the array references can be used in the program. This program shows how you might go about adding a simulated "random measurement error" to the measurements of a rectangular solid.
Suppose a teacher gives 5 quizzes and drops the lowest score. To "drop" the lowest score means to replace it with zero. Here is a SAS program that will do that, while also keeping the information about the dropped score. This example differs from the previous ones because the data step is reading from cards. Therefore, the "iteration" of the data step comes into play. The statements in the data step repeat for each observation read from the data. Here, the array statement is given first and creates the variables q1-q5. However, it can also be placed after the input statement, allowing the input statement to create the variables. Either way, there is no conflict. Two new variables are created, "low" to hold the low score, and "lowi" to hold the index of the low scoring variable. These are initially assigned the values of q1, then the do loop compares the values to each of the other scores, to see if any are lower. When the loop is finished, the proper variable is assigned a zero value.
Exercises:
1. Create a data set that contains the first four powers of the numbers 1 through 10 (e.g., 2, 4, 8, and 16 are the first four powers of 2). Use an array to assign the values to each variable. Use only one assignment statement, and take advantage of a loop index to assign the right power to each variable.
2. For the following data, use arrays to find the highest score in each row then create adjusted scores so that each is a percent of the maximum. Assume that the columns are students and the rows are quizzes.
10 9 7 8 5 5 8 4 6 8 4 2 6 4 9 3 5 8 4 5
3. Use the same data as #2, but consider each row to be the scores of one student. Use an array to move the smallest score to the last position. You will compare each pair in turn, and if the first is smaller, switch them. (Note there are four pairs to check.) You will need a temporary variable to hold one value while you do the switch (i.e., x2-->tmp, x1-->x2, tmp-->x1).
Copyright reserved by Dr. Dwight Galster, 2006. Please request permission for reprints (other than for personal use) from dwight.galster@sdstate.edu . "SAS" is a registered trade name of SAS Institute, Cary North Carolina. All other trade names mentioned are the property of their respective owners. This document is a work in progress and comments are welcome. Please send an email if you find it useful or if your site links to it.