Lesson 26-27: SAS Graph
SAS/Graph is the high-resolution graphics package for producing plots and charts in SAS. Many basic commands are similar to those we have learned for proc plot and proc chart. In fact, you can often insert a "g" in front of "plot" and "chart" and end up with a workable result, using the same syntax. However, there are many more options and capabilities available. We will just explore a few of them here.
SAS/Graph is a separate module or package in SAS, like SAS/STAT. In the Online Documentation, you will find an entry called "SAS/GRAPH Reference" in the first set of branches. After opening that, you can click on "SAS/GRAPH Procedures," followed by the name of the procedure you want to use.
We begin with proc gplot. You can get a passable plot by typing commands similar to those in proc plot: Note that the plot is displayed in a new window, a "graph" window. This window behaves differently than the output window in one important way--it does not scroll down automatically when a new plot is created. This is easy to forget! Don't be fooled when it looks like your results haven't changed!
Note that we have two series because of the "=group" in the plot statement. By default, proc gplot assigns different colors to each value of group and prints a legend.
There are many things that can be done to customize this graph. The most common commands are summarized in SASGraphCommands.doc.
We can customize the symbols and colors used for the two groups. To do this we write symbol statements, which are global in effect, so we usually place them above the gplot step (although they work inside as well). The symbols are numbered and the symbol statements written accordingly, much like title statements. Like title statements, the symbol definitions remain in effect until you change them. Unlike title statements, a change to a higher number symbol does not affect the lower numbered ones. Also, individual commands inside symbol statements do not get cleared or reset by leaving them out of a subsequent symbol statement.
The value= command determines the shape of the symbol, and the color= command, well, you know. These can be abbreviated v= and c=. Many common shapes and capital letters can be used for the value. Most color names that you would think of will work too.
If you want to connect the dots, use one of the interpolation methods, abbreviated as "interpol" or just "i". The methods are join, spline, and regression. "Join" connects the points with straight lines. "Spline" uses a polynomial function for a smooth fit. However, if the points vary too much, there can be wild peaks and valleys. Both these methods connect the points in the order they occur in the data set. With the spline method, you can add the "s" option, which stands for sort. It looks like "splines" but it means it will sort by the x variable so that the points are connected from left to right.
The regression method has several parts to its command. It starts with r. The second character is either l for linear, q for quadratic, or c for cubic. Then, you can have either cli or clm (confidence limits of prediction or confidence limits for the mean) followed by a confidence level like 80, 90, or 95. The regression equation is also printed in a note in the log.
A height= (h=) command controls the size of the value symbol, and a line= (l=) command selects a line style.
Now wouldn't it be nice if the symbols didn't spill over the frame of the graph? So glad you asked...
The appearance of the axes is controlled by global axis statements similar to the symbol statements, except that they are not automatically applied. The axis definition must be assigned to an axis using the vaxis or haxis option in the plot statement. Here we see two of the options in the axis definition demonstrated. The order= option controls the spacing and range of the values displayed on the axis. For categorical data, a list of category names can be given in the parentheses. The label= option controls the appearance of the axis label. The options inside the parentheses apply to the text that follows, so some characteristics can be changed in the middle of a word. Similar options can also be used in title statements, which will control the appearance of the titles in graphs. Also shown here is a legend statement, and the corresponding reference in the plot statement.
At this time it is appropriate to add a note about sizing graphs. The graphs that SAS produces can be resized by dragging the borders like other windows objects. However, this shrinks or expands everything in the picture (like it should), so if you make the graph smaller you might no longer be able to read the text. Although there are options that can specify the size of the graph, it is worthwhile to experiment with another method. The intial size of the graph is determined (by default) by the size of the SAS window (the application window, not the graph window). Resizing your SAS window can go a long way toward giving you the results you want. Then, once a graph is produced, you should try resizing it (the graph window this time) by small amounts in both directions. This can affect the appearance considerably, through small adjustments in spacing between objects, and even in the displayed font. When the graphs above were produced, for example, the font for the numbers on the axes (tick mark labels) was thin and hard to read. A small adjustment in the size of the graph changed it to what you see here.
Here is another type of plot, called a bubble plot. This is a three-dimensional plot, because the size of the bubble is determined by the variable on the right side of the equal sign. In this case there are only two values for group, so you only see two sizes.
SAS has more nice examples in the documentation. See some under "SAS/GRAPH Reference, SAS/GRAPH Procedures, The GPLOT Procedure." To find more information about symbol and axis statements, and tahe like, see "SAS/GRAPH Reference, SAS/GRAPH Concepts, SAS/GRAPH Statements." It is really worthwhile to browse through this documentation to get an idea of what SAS/GRAPH can do.
Next we turn to proc gchart, the high-resolution version of proc chart. Again, we can begin with simple commands such as those we have learned in earlier lessons for proc chart. Here is a bar graph with continuous data.
In this case, x was treated as a continuous variable, and SAS used midpoints of 7 bin ranges that it chose according to some default rules. You can use the levels= option to specify how many bins you want SAS to create, or you can use the midpoints= option to list the midpoints you want. You can list the numbers in parentheses or use "(a to b by c)" notation. If your values are discrete, use the discrete option, as shown below. If the chart variable is character, there will be a bar for each value, and the discrete option is not used.
Like proc plot, you can also use a group option.
With a subgroup option, the colors of the bars and a crosshatch pattern can be controlled using pattern statements. The value= or v= option uses "L" for left slanting, "R" for right slanting, and "X" for crosshatch, followed by a number for the style.
You can make block charts:
Try out the hbar3d and vbar3d statements.
And, of course, there are pie charts.
The explode= option separates the listed slices away from the pie for emphasis. The angle= option turns the pie.
Download the data program to get started. Produce a graph using proc gplot for each of the four problems below.
1. First graph y*x and z*x on the same axes, using default setting (symbols and axes).
2. Customize the graph by defining some nice symbols and modifying the axes as you think appropriate (don't add interpolating lines at this time). Add an appropriate title too, perhaps with some nice formatting or color options.
3. Use the line interpolation method, with two different line styles for y and z.
4. Do a plot of only y*x with a linear regression line and clm90 option.
To get a little practice with proc gchart, simulate 200 tosses of a pair of dice and calculate the sums. Create a chart using proc gchart for each of the five problems below. Explore options such as coloring, patterns, etc. as you wish.
1. Create a frequency histogram (use the discrete option) for the first die.
2. Create a frequency histogram for the sum of the dice.
3. Make a pie chart for the first die.
4. Make a pie chart for the sum of dice with an "exploded" view of "7".
5. Create a side-by side bar chart for the two dice, with 1's grouped, 2's grouped, etc.
For number 5, the data needs to be organized differently. You can do the other problems first, using the same data, then do one of two things: You can generate new data that has a die number for one variable and the die toss result for the other, or you can figure out how to rearrange the data you have so that it is in that form (this would be better practice). In any case, you need to end up with something like:
2 5 etc.
Optional challenge for geometry and craft fans: Write a
data step that creates points on a circle, ordered in such a way that when using
proc gplot with i=join, the points will be connected like this string art:
Copyright reserved by Dr. Dwight Galster, 2006. Please request permission for reprints (other than for personal use) from firstname.lastname@example.org . "SAS" is a registered trade name of SAS Institute, Cary North Carolina. All other trade names mentioned are the property of their respective owners. This document is a work in progress and comments are welcome. Please send an email if you find it useful or if your site links to it.