Lesson 24-25: SAS Macro
You may have used macros in Excel, or at least, you have probably heard about them. In Excel, macros are basically used to carry out a series of complex tasks by using one command (perhaps linked to a button in the spreadsheet). Thus a macro command carries out other commands.
In SAS, the macro language is a language "above" the regular SAS language. It essentially generates SAS code for you. When you submit a SAS program, the first thing that happens is that SAS scans it for macro code. The macro code is interpreted, or compiled, into SAS language statements, then the statements are executed.
We must distinguish between macro code in general, and "a macro" in particular. Some macro commands work anywhere in a program, but most work only inside a macro. A macro is like a sub-routine or procedure that is compiled and stored in a library, and then "called" or "invoked" by a SAS program. SAS statements that are not part of a macro are called "open code." All of the programs we have written so far have consisted entirely of open code.
We begin with some ways that macro language is used in open code. A fairly convenient application is to define macro variables that can be used to change values in a program. A macro variable can be defined at top of the program, where it is easy to find and change. It then can determine what happens further down. Here we look at a data step similar to that used in the Yahtzee program. The first two lines are assignment statements for macro variables. The percent sign is always the first character in macro language keywords. The %let commands assign values to the macro variables dice and reps. The values of macro variables are always text (character). This is because they will be inserted into the program statements that the macro language is writing for us, which becomes part of the program before SAS executes the program statements.
When the macro variable is used, that is, when we want to insert its value into the program, it is preceded by an ampersand (&). Whenever the macro interpreter sees the ampersand, it tries to interpret the value of the variable that follows. SAS calls this resolving the macro variable. So, in the array definition, you can see that the value of "&dice" will be "5", so that after interpretation, the line says "array x (5);". The next line, which sets up the loop for the number of repetitions, will be interpreted as "do i=1 to 10;". Since the number of dice tossed has to match the array size, the next line uses the dice variable again, and it will be interpreted as "do j=1 to 5;". In other words, what SAS sees after the macro compiler gets done is like this:
So, the idea is, if you have numbers or names in your code that you might want to change later, especially if they occur many times and might be anywhere in a long program, it is a good idea to put them in macro variables that are defined at the top of the program. Then you can change the values in the macro variables and automatically change all the related values in the program.
The macro variables are also available in title statements, which can be very useful to keep track of the settings you use. For this to work, the title must be enclosed in double quotes, not single quotes. Single quotes will prevent resolving the macro variables, and you will get a title that says exactly what is typed between the single quotes.
Now let's create a macro. The macro begins with the keyword "%macro" followed by a name for the macro, and ends with the keyword "%mend" which means "macro end." You can put the name of the macro after %mend if you like, to keep track of which macro is ending. Now, what we are actually doing here is creating a macro, not executing the statements inside. When the code below is submitted, there will not be any data set created or any output from proc print. The macro will be compiled and saved in a subfolder of the work directory called "sasmacr." There is NO message in the log that this has happened. (You will only get a message if there are errors.)
This macro makes use of a %do loop. Note that there is also a %to and a %end. This is macro code that creates a loop for the statements inside. That means, as the macro runs, it is writing the code inside three times. The variable in the %do loop, dsnum, is a macro variable. Its value is text, just like the other macro variables. The data statement uses this value to name the data set differently each time the loop executes: one1, one2, and one3.
Now we need to run the macro. To do that, issue a command in SAS using the percent sign and macro name.
In the explorer window we can see that three data sets have been created.
Here is the output from the third loop.
Now consider the data step below. Suppose this is an example of a much larger data set that we have on hand (perhaps 52 weeks worth). For some reason, we need to split it up into separate data sets for each week.
We could write a data step like this, but you can see that if there were many more weeks, this could be tedious.
So let's get a macro to do the typing for us. Did you notice that the enhanced editor's color coding changes? Inside a macro it focuses on the macro language elements. The regular SAS language keywords no longer turn blue. Another thing to notice is the usage of semicolons. Macro statements end in semicolons like any other SAS statements. However, all macro statements are separate from the code that they create. In the data step above, there is no semicolon until all four weekly data sets have been named. In the macro, there is no semicolon after the data set option. If we put one there, we would get a semicolon after each one. Furthermore, there is a "%end;" which has a semicolon, but that semicolon is not going to be part of the text the macro writes, because it is part of a macro statement. Instead we put a semicolon by itself after the loop. This semicolon will be part of the text in the final program. The macro shown below writes a program just like that shown above.
Remember that a macro is stored in a library, and can also be stored permanently. The creation of a macro is separate from calling (using) a macro. The last line in the program above is the macro call. Once the macro is saved, one needs only to call it, not re-run the code that creates it. With that in mind, consider that we might want to pass information to the macro when we call it. Suppose in the program above we want to choose which weeks to split out when we call the macro, rather than automatically doing four of them. Let's assume for this example that there are 52 weeks in the original data set and we want to pick an arbitrary range. Then we can make our macro accept parameters (which are macro variables named in parentheses after the macro name, separated by commas), which we "pass" when we call the macro (putting the variable values in parentheses after the macro name, separated by commas). Here we show the macro with two parameters which supply the starting and ending week numbers for the %do loop. In the example, weeks 1 to 2 are requested in the macro call.
The idea here is that this macro would probably be saved permanently, then whenever we needed some of these data sets, we just call the macro with the appropriate week numbers. We don't have to rewrite the program each time.
Now let's turn things around. The following SQL code combines the data from four weekly datasets for one of the salesmen and calculates a total.
Our goal will be to turn this into a macro in order to accomplish two things: 1. to eliminate the repetitive typing, and 2. to make it possible to pass parameters that specify the range of weeks and the salesman for the report. The way to approach this, as a beginning macro programmer, is to take an example of working code, like that shown above, put it inside a macro, then change things a little at a time, frequently checking if it still works. It is also good to separate the code in strategic places so that the repeated parts are on their own lines, being careful to note whether the punctuation is also repeated..
Now focus on the first line of repeated phrases. Note that in this case, all end with a comma and are the same except for the numbers. We put a do loop around this line, and delete all but one example.
Next, we need to replace the numbers with the macro variable i (the loop counter). But before we can do that, we need to understand a little more about how macro variables are resolved. For the following discussion, suppose that the value of macro variable num is "1" and the value of macro variable month1 is "January" (remember, macro variable values are text).
In previous examples, our macro variables have either stood alone or they have come at the end of a word. Sometimes macro variables are embedded inside words. For example, suppose you wanted to replace the number in "month1sales" with the macro variable, num. If you put "month&numsales" the macro compiler would look for a macro variable variable called "numsales", not just "num". To get around this problem, the macro language uses a period (dot) to signal the end of a macro variable. Writing "month&num.sales" would be correct. The period is part of the macro variable reference, so that "&num." would all be replaced by the value of the num macro variable, resulting in the correct resolution, "month1sales".
Macro variables can be "nested" so that one variable resolves to complete the name of another. A double & is used indicate this nesting. For example, "&&month&num" will first resolve to "&month1" and then to "January". You can think of it this way: The compiler will make two passes through this phrase (if it finds "&&&" it will make three, and so on). In one pass, it will hold off on interpreting things that are preceded by more than one "&", but it will remove one "&" in preparation for the next pass. Wherever there is only one "&" left, it resolves the variable, and the value becomes part of the text it can resolve in the next pass.
If nesting is used, one dot is resolved with each macro variable. If there is supposed to be a dot in the result, make sure to include that in addition to the others. Thus "month&num..sales" resolves to "month1.sales", "&&month&num..sales" resolves to "Januarysales", but "&&month&num...sales" resolves to "January.sales".
Now, to return to our example, we have a period as part of the SAS code we want to generate. If we write "saleswk&i.week&i," the macro compiler will resolve this to "saleswk1week1," with no period! Therefore, we have to put in an extra period.
Now let's move on to the next line with a repeated pattern, the one that comes from inside the parentheses of the sum statement. There is one difference between this line and the previous one--this one does not have a comma at the end. That means we have to include a comma in all but the last loop. To do that, we use a %if - %then statement. The text between %then and the semicolon will be added to the program if the condition is true. The semicolon is the end of the macro statement and is not part of the generated text.
This could be considered a short form of the statement. In fact, you might find it a bit unsatisfying--don't you want it to "%do" something? It can be written that way, and must be, if there is a semicolon in the text you want included. Below is an alternate way of doing the same thing.
We can finish up the remaining repeated sections in a similar way. The completed macro looks like this:
Now we have finished the first goal, which was to eliminate all repeated typing. Our second goal was to make it possible to pass parameters that specify the range of weeks to report, and the salesman to report on. The parameters are the three macro variables in parentheses after the macro name. Since our original macro had a range of 1 to 4, it is a simple matter to go through the program and replace every instance of a 1 with "&start" and every instance of 4 with "&stop", being careful to add an extra period where necessary (these numbers don't occur in the program for any other reason than to refer to the data sets and variables we want to use). The salesman's name only occurs one time at the end, inside quotes. It is important that these quotes be double and not single in order for the macro variable to be resolved. An example of calling the macro is included, with the corresponding output shown below the program.
The process of detecting and correcting errors is more difficult when using macro language. One reason is that there are now two levels of errors, those in the macro code, and those in the program generated by the macro. For instance, if you delete one of the %if statement in the above program, which makes the SQL code incorrect, and create the macro (without calling it) this is what you get in the log:
No errors are reported, because there were no errors in the macro code. There are also no helpful notes telling us the macro was successfully created. There is nothing but a copy of the lines that were submitted. Now suppose we call the macro:
SAS reports the error, but it is not related back to the line numbers in the macro program! This is because the macro is generating the code in the background, and the lines it generates are not copied to the log. It does not identify the exact location where the error occurred. It would be very hard to figure out from this limited information, where exactly our error is. For this reason, it is important to try to write macros in small steps, and test them often.
There are two tools are available for debugging. The first is the symbolgen system option (put in an options statement, use nosymbolgen to turn it off). This will cause macro variables and their values to be printed to the log when the macro is called. This can be helpful to find out if your macro variables are resolving correctly, but once again, it may not be easy to connect these messages to the code that you wrote. Here is part of the log that came from calling the macro with symbolgen turned on.
The second tool is the %put statement. This is much like the regular put statement, in that it writes messages to the log. By careful planning, we can make %put statements tell us what is going on in the program. Here, a %put statement is inserted after each %do loop just to tell us it has finished. This, in combination with the symbolgen option, gives a pretty good indication when the error occurred--after the third %do loop, and after the macro variable i had a value of 1. In other words, it occurs because the comma is missing after saleswk1--not saleswk3, as the error message suggests!
Using symbolgen may give you more information than you need or want, so you can also use macro variables in your own put statements to report the values you want to see.
Each company that implements the SQL standard can add its own enhancements. SAS had added a useful feature that allows SQL to assign query results to macro variables. The syntax uses the key word "into" together with a macro variable name with a colon in front of it. Here we show how the means of two variables are stored in two macro variables, which are then resolved in proc plot to create reference lines in the graph.
SAS has a number of built-in or automatic macro variables. One example is "sysdate". The following title statement will include the current date in the title statement.
1. Suppose you get a monthly sales report with the salesman's name and total sales for the month. The reports are loaded into SAS data sets with names like Month1, Month2, etc. You want to generate quarterly summary data sets that include each salesman's name, sales for the three months, and total sales for the quarter. Since this is not a one-time project, but an ongoing task that will presumably continue for years to come, you want to automate the process as much as possible. The initial SAS program creates six months' worth of data and includes SQL code that will create the first quarter's summary. As you have seen, it is best to create macro code in small steps, and run the program frequently to make sure it works as you add more components. I will try to guide you through the steps here, but you only need to turn in the log and output for the final product, NOT every step listed below.
a) Run the program as given to create the data sets. You can then comment out the data steps or delete them, as they do not need to run again (until you start a new session, unless you save them to a permanent data set).
b) At the top of the program, define a macro variable called "q" for the quarter number and set it equal to 1. The idea is that this is the only thing the "user" should have to change when running the program for a new quarter. We will need to generate the corresponding month numbers from q. Define macro variables m1, m2, and m3, setting them equal to a month value that is generated from the quarter number.
c) Define a macro around the existing SQL code, with a %macro, a %mend, and then a statement to call the macro.
d) Now start substituting macro variables into the SQL code in each place where a quarter number or a month number appears. Make sure your program runs correctly before adding any %do loops.
e) Now "compact" the code using %do loops each time a sequence of similar terms appears in the SQL code.
f) When you have everything running correctly, add a proc print statement
to print out the table that is created by the SQL command, and submit the log
2. In this problem we will create a macro that will take parameters for a data set name and two variable names, then produce a plot (using proc plot) based on the parameters that are passed, and will also automatically generate a vertical reference line at the mean of the vertical variable.
a) For purposes of this exercise, use stavwood.txt as the source data (this file has tabs in it). The variables in this data set are group, y, x1, x2, x3. Begin by getting the next two steps to work outside of a macro.
b) Use proc SQL to get the mean for your vertical reference line into a macro variable.
c) Now write a plot step and use the vref option with the macro variable to create a reference line.
d) When you have all this working, put it inside a macro. Note, the data step that reads stavwood.txt is not part of the macro. The macro should have parameters for data set name, vertical axis variable, and horizontal axis variable, in that order. Now you must change the program inside the macro so that all references, and I do mean ALL, to these three things are replaced by macro variables. (Include the log from compiling your completed macro in your homework submission.)
e) Call the macro, putting in your dataset name, y for the vertical axis variable, and one of the x variables from stavwood.txt for parameters. Make sure the plot is correct and there are no errors in the log. (Submit this output with your results. Include the log from calling your macro in your homework submission one time. )
f) Call the macro again, reversing your vertical and horizontal variables. What happens? (Submit this output with your homework.)
g) Call the macro again, with another combination of variables. (Submit this output with your homework.)
Copyright reserved by Dr. Dwight Galster, 2006. Please request permission for reprints (other than for personal use) from firstname.lastname@example.org . "SAS" is a registered trade name of SAS Institute, Cary North Carolina. All other trade names mentioned are the property of their respective owners. This document is a work in progress and comments are welcome. Please send an email if you find it useful or if your site links to it.