Lesson 21: Data Null
It has been the author's experience that in many job interviews where SAS programming is an important part of the job description, there are questions about "data null." It appears that employers consider this a sort of "litmus test" of the level of a candidate's ability. Therefore, it is important that we give a little attention to this topic.
The idea of "null" here is that we have a data step that actually doesn't create a data set. For a data set name, we use the special name "_null_" where the underscores are part of the name. This causes SAS to carry out the commands in the data step, but as far as the output is concerned, it is, well, "null" or nothing.
Why have a data step that doesn't save anything? Actually, it doesn't save a data set, but it can save something else, in particular, a text file. Thus, a data step can be used for report writing. or the creation of "raw data" files.
The process is simply the reverse of reading a raw data file. Instead of an "infile" statement, there will be a "file" statement. Instead of "input," there will be "put." Instead of informats, formats.
First, let's see what a put statement does in an ordinary data step. It sends lines of text to the log. You can have character expressions in quotes and variable names in the statement. The variables will have their current values printed. The following data step has three iterations, so three lines are printed to the log.
This is very useful for debugging data steps with loops and conditional statements, since you can examine the values the variables take as the data step executes.
Next, add a file statement. This gives the location and filename that you want to save the results of the put statements in. Note that the log shows where the file is located, as well as the number of records written. It also shows that the data step is still writing a data set.
The same program, with only the data set name changed to "_null_", gives the log below. There is no "NOTE" about the data set, because none was created.
Now let's revisit the used car data from previous lessons. Suppose we begin with reading the data into a SAS data set. Then, we use data null to write some of the data to another file. Notice that the data step will iterate once for each observation in the source data set. The variables, as they are listed in the put statement, are sent to the file with just a space between them. This is a list put style, similar to the list input style.
We can use the formatted put style to control the appearance of the output.
But we can do way more than that! Here is an example that uses the internal variable "_n_" (the underscores are part of the name), which keeps track of the observations, to control when to print a heading. So, if we are on the first observation, the first if statement puts the two header lines into the file. The second if condition is not true, so it does not execute, then the final put statement sends the first observation to the file. For the remaining observations, the first if condition is false, so the header lines are never printed again.
The second if condition has two parts. The observation number must be greater than 1, and the value of make must be different from the previous observation. The lag function allows us to compare values between observations, with lag1 being the previous observation, lag2 being the the second previous observation, etc. So, after the first observation, if there is a difference in makes, a blank line will be inserted, before the last put statement sends the detail information.
Copyright reserved by Dr. Dwight Galster, 2006. Please request permission for reprints (other than for personal use) from firstname.lastname@example.org . "SAS" is a registered trade name of SAS Institute, Cary North Carolina. All other trade names mentioned are the property of their respective owners. This document is a work in progress and comments are welcome. Please send an email if you find it useful or if your site links to it.