Lesson 16: Multiple Lines, Multiple Observations
There are a number of situations where the lines in a raw data file do not correspond to the observations in a SAS data set. In this lesson, we explore ways to read data files that have multiple observations on one line and multiple lines for one observation. Since the latter is usually simpler, we will start with that.
Consider the following data step (data here):
By default, when SAS runs out of data on one line but still has variables to read, it goes on to the next line. In past lessons, we found that sometimes this causes problems, but here, it does what we want. In fact, it doesn't matter whether some observations take up one line and some take up two, as long as each new observation starts at the beginning of a line. If an informat causes problems with a short line, the truncover option or the colon modifier can be used, as demonstrated in previous lessons. Do not use missover, though!
As long as you read all of the variables (in order), the above method works fine. There are some other tricks available if you do not want to read everything on a line, or want to read things in a different order. A slash in the input statement tells SAS explicitly to go to the next line. In this case, if there were more variables on the line after first and last name, the slash would still cause SAS to go to the next line after reading Lname. Of course, now there is no flexibility in terms of having some lines broken and some whole--all observations must be on two lines.
Another kind of notation explicitly numbers the lines in an observation. Again, there must be a consistent pattern in the data, so that all observations occupy the same number of lines. Both of these examples give the same output as shown above.
The "number" notation allows skipping around within the lines that make up an observation, as shown here: (This example needs to be fixed, it is reading the same variables twice. This works best with column or formatted input.)
Now let's look at the opposite case--where there is more than one observation in one line of the source file. Consider the following data step:
In this case, only two observations were included in the data, because SAS automatically goes to a new line when it reaches the end of the input statement. What we need is a command to stop it from doing that.
But first, a word about iterations. When a SAS data step reads from an external file or another data set, there is an automatic loop in the data step that repeats over and over until the input data runs out. One pass through this loop is called an iteration. Many things can happen during an iteration, including the creation of new variables by assignment statements, and the elimination of observations by "if" or "where" statements. In the simplest cases, one iteration reads one line from the raw data and outputs one observation, but many modifications are possible. In general, the commands for an iteration begin with the input statement (or set or merge) and go to the end of the data step, excluding cards. An iteration ends with an implicit output or the last of the data step commands if there is an explicit output statement. There can be other loops inside an iteration, as we shall see momentarily. But the iteration itself is an implied loop that repeats as long as there is input data available.
To hold the input line, that is, keep the data pointer from moving to the next line, we place an "at" symbol ("@") or "double at" ("@@") at the end of the input statement. The "@" holds the line during the current iteration only, while the "@@" holds the line across multiple iterations. In the example above, the iteration consists only of reading name, ht, wt, and age, then outputting an observation. Therefore, we need to hold the line for the next iteration, which requires the "double at."
In Lesson 14 we used the following example in an exercise on proc transpose. We first read the data into five variables, then, transposed it so that there was a treatment and one response for each observation (together with a "name" column).
Now, we will see how to read this directly in the form we need. In the example below, the lines from "input" to "end;" determine one iteration. We first read the "trt" variable, then hold the line for this iteration. Then we start a "do loop." The "do loop" allows us to repeat some commands according to a pattern. It begins with the keyword "do" and ends with an "end" statement. The variable s is the index of the loop. It will hold the values from one to five, in turn, as the loop is executed. Each time through the loop, the input statement directs SAS to read a value of y, hold the line (for the current iteration), after which the output statement writes an observation to the data set. When the loop is finished, the iteration is complete, and SAS moves to the next line in the data and begins another iteration.
Next, we put together the do loop and an if-then-else construct. Note how the size values are defined. You must be careful with the length of the variables in this case, since it will be set by the first value given in the program. (Another way to handle it is to use a length statement before the input statement.)
Write data steps to read the following data (copy and paste it into cards just as it is). Print the results.
1. A series of y values:
13 25 22 17 19 11 16 18 21 14 17 20 18 15
2. Name, age, and grade.
Kelly 9 3 John 10 3 Mark 10 4 Joan 11 4 April 8 2 Larry 9 2 Daniel 11 5
3. Treatment and four responses for each. Read in univariate style, with a treatment and one response per observation.
ctrl 77 69 72 79 trad 87 96 89 82 new 89 94 96 81
4. Treatment, gender (male), three responses, gender (female), three responses. Read in univariate style, with a treatment, gender, and one response per observation.
ctrl M 77 69 72 F 79 81 72 trad M 87 96 89 F 82 99 85 new M 89 94 96 F 81 83 87
Copyright reserved by Dr. Dwight Galster, 2006. Please request permission for reprints (other than for personal use) from email@example.com. "SAS" is a registered trade name of SAS Institute, Cary North Carolina. All other trade names mentioned are the property of their respective owners. This document is a work in progress and comments are welcome. Please send an email if you find it useful or if your site links to it.