Lesson 12:  Proc Chart, Proc Plot, and Proc Corr

<<Previous|Contents|Next>>

Bar Graphs with Proc Chart

Proc chart is a procedure that produces text-based bar charts as well as pie and star charts. 

The vbar command produces a "vertical bar chart," as seen above.  The discrete option, listed after the slash on the vbar statement, indicates that the variable is to be treated as a discrete value, with one bar for each value of the variable. (This does not apply to character variables.) Without this option, SAS groups numerical values in evenly-spaced classes like a histogram.  This is appropriate for the y variable, as shown below.  Note that the horizontal axis now indicates that the labels are midpoints of the respective ranges.

That this is inappropriate for variables that are actually discrete (unless there are a large number of values), is apparent from the chart for x done without the discrete option, where you can see the data have been strangely grouped with non-integer midpoints (3 and 4 are combined with the label 3.6).

Complementing the vbar statement is the hbar statement, for "horizontal bar chart," of course.  In addition to the horizontal display, the hbar statement also displays some statistics for each class.

Another possibility is the block statement, shown here with a subgroup= option which subdivides the columns by another variable value.  First, a warning:  these "3-D" charts have limits because they require a certain amount of space on the page.  If SAS cannot fit your requested chart on a page, you may get an error message in the log and a substitute graph in your output.  You may be able to adjust your linesize and pagesize options, or change the number of bars to display, or turn dimensions around in order to get better results.

A group= option is also available to produce side-by-side bars, and they can even be used in combination.  Here is part of a block chart of y using a group option with the variable "group."

Here is a vertical bar chart using both the group option for a side-by-side arrangement and the subgroup option for vertical stacking, thus combining three variables in one chart.

Some other options you can try include "levels=n" which specifies the number of bars, "space=n" which regulates the space between bars, "midpoints=list" where you specify the midpoints you want for the bars, and "freq=variable" which is used when one of your variables already contains the frequency to be used in making the height of the bars.  For complete information about the chart procedure and other available options see the SAS Online Documentation under "Base SAS/Base SAS Procedures Guide/Procedures/The Chart Procedure."

And finally, here is an example of a pie chart.

Scatterplots with Proc Plot

Proc plot produces text-based two-dimensional plots.  Note that there are special characters used in the output window for the axes; if you copy and paste these, you need to use the SAS Monospace font to have it come out right.

The data set shown below is used in these examples, and can be downloaded here.

A simple plot of y by x is produced by the following code.  Note that the first named variable goes on the vertical axis, and the second on the horizontal axis.  By default, SAS prints characters to represent the data points.  Sometimes more than one point occurs in one place.  Higher letters are used to indicate the number of points represented by that letter.  In this example it didn't happen, but SAS prints a legend to explain it, anyway.

Another thing to notice is that the title bar of the editor window says "PROC PLOT running."  Proc plot is one of a few procs that does not exit when it encounters a run statement.  Instead, it stays active and waits for more plot statements.  It will exit if followed by another step, otherwise, a quit statement can be used at the end, as shown in the next example.  (Although "run;" is not necessary in this example, it would be good practice to include it, before the "quit;".)

If you want some other symbol, instead of the default A-B-C for the data points, you can assign it as shown below.  In this case, if there are two or more points in one place, SAS prints a message telling how many points are hidden, but you cannot tell which ones they are. (There is an example further down.)

You can also create a kind of three-dimensional plot by assigning symbols based on a third variable.  SAS only uses the first character of the value, but if this is a problem, a custom format can be used to define more useful symbols.

It is also possible to display a relationship between one variable and two or more others.  Usually we put the common variable on the horizontal axis.  This is done by putting two or more plot expressions in one plot statement, and using an overlay option, which follows the "/".  In the example below, a by statement has also been added, and the output is shown for group=2 only. 

Sometimes it's nice to have reference lines on a plot.  Proc plot has the vref= and href= options to provide these.  Now the v and h do not refer to the orientation of the line, but to the position.  Thus a vref is a vertical reference, which is a horizontal line at a particular vertical height.

Information about more options for proc plot can be found in the documentation under "Base SAS/Base SAS Procedures Guide."

Correlations with Proc Corr

Proc corr computes measures of association between two variables.  The primary one, of course, is correlation (Pearson product-moment correlation), but Spearman's rank-order, Kendall's tau-b, Cronbach's Alpha, and others are also available.  For details of these, see the documentation, under "Base SAS/Base SAS Procedures Guide:  Statistical Procedures."

Using proc corr can be as simple as writing one line, much like proc print.  By default, proc corr calculates correlations for all pairs of numeric variables in the data set.  Some of these may not be sensible (such as for an id variable).  The output included simple statistics (n, mean, standard deviation, sum, min, and max) and a matrix with all the variables listed vertically and horizontally, so that one finds the desired correlation by looking at the intersection of a row and column.  In fact, the layout resembles that of an actual correlation matrix as used in statistical theory, if only the top number of each entry is considered.  the bottom number is a p-value for the hypothesis test whose null hypothesis is that the correlation is zero (there is no correlation), and whose alternative hypothesis is that the correlation is not zero (there is correlation).

If there are variables you do not need to correlate, or if there are too many variables to look at all at once, some modifications can be made.  A var statement can be added to determine which variables will be included.

In addition to the var statement, a with statement can be used to make the output more compact.  The "var variables" are listed horizontally, and the "with variables" are listed vertically.

 

<<Previous|Contents|Next>>

Exercises:

1.  Using proc chart or proc plot and the used cars data from the previous lessons:

a)  Make a vertical bar chart for the makes.

b)  Make a histogram for price with seven bars (histograms for continuous variables should have no spaces between bars).

c)  Produce a plot that relates year and price, with price on the vertical axis.

2.  The following numeric data are values of the variables x and y respectively. 

cards;
23 35
27 36
24 35
21 32
29 36
28 39
;

a)  Copy this data into the SAS editor and write a data step to read it.  The prediction equation (regression line) relating y to x for this data is given by yhat=20.60811 + 0.58784x, where yhat is the name for the predicted value. Define this variable and include it in the data set..  Also include the natural log of x and the natural log of y in the data set.

b)  Use proc means to find  the means of y and yhat (you might notice something interesting).

c)  Make an overlay plot that shows the y values and the predicted values with two different symbols, plotted against the x variable..  Include a vertical reference line at the mean of y, as found in part b.

d)  Find the correlations between all five variables (default output of proc corr).

e)  Display the two by two correlation table of x and y.

f)  Use var and with statements to display only the correlation of the log of x with the log of y.

Copyright reserved by Dr.  Dwight Galster, 2006.  Please request permission for reprints (other than for personal use) from dwight.galster@sdstate.edu  .  "SAS" is a registered trade name of SAS Institute, Cary North Carolina.  All other trade names mentioned are the property of their respective owners.  This document is a work in progress and comments are welcome.  Please send an email if you find it useful or if your site links to it.