Lesson 11: Proc Means and Proc Freq
Summarizing Data with Proc Means
Proc Means can provide some of the same information as proc univariate, but has different output formatting and different options. For relatively simple reporting of summary statistics, proc means provides a more compact output. The example below shows proc means with a by statement and the first section of the output. The function of the by statement is the same as in proc print or proc univariate.
The statistics for the listing are requested as options in the proc statement (see documentation for complete list). Those shown above are printed by default. The next example shows how some others might be requested. And, like proc univariate, proc means produces an output data set. Note that the statistics for the listing and for the output data set need not match.
Proc means allows a shortcut in the output statement if only one statistic is requested and the same variable names as the original variables are desired for the output statistics. This does not work in proc univariate.
Proc means also has a class statement. This is somewhat like a by statement but the results are grouped together.
If you have more than one class variable, there is also a way to get summarizations at more than level of combinations of classes. This is done with the types statement. The example below gives the overall summary, represented by the empty parentheses, a summary by group, and a summary for each combination of group and x. Only part of the output is shown. In the output window, the different levels are placed into different tables.
When the class and types statements are used together with an output statement, all the different combinations go into the data set. The _type_ variable indicates which level it is, such as overall, where _type_=0, group, where type=2, etc. Here again, we show only part of the output.
Summarizing Categorical Data with Proc Freq
Proc Freq produces frequency tables for numeric or character variables. The "tables" statement is used to specify which variables to use in the table(s). If no tables statement is given, a one-way table for each variable in the data set will be produced (this is not usually a good idea). Multiple tables can be specified in one tables statement, and multiple table statements can be given. The data for this example is here.
Several options are available in the tables statement and are listed after a slash if used. The example below shows nocum and nopct options, which suppress the cumulative statistics and percents. The nofreq options will suppress the frequencies.
Two-way tables are requested using an "*" symbol notation, as shown below. The first variable will be listed vertically. The upper left cell gives the key for the numbers in the table. Three-way and higher tables can be requested, if desired. Proc Freq then produces a collection of two-way tables, one for each of the additional values of the other variable(s).
Some useful options for two-way tables are norow, nocol, nofreq, and nopct. These are used to suppress each of the four numbers in the table cells, and are especially helpful if the tables are large.
If a table is to be built from a continuous variable, proc format can be used to group the values in a suitable way.
Proc freq has many more capabilities. It can produce output data sets and many statistical tests and measures of association. See the documentation for further information, under "Base SAS/Base SAS Procedures Guide: Statistical Procedures." Here is one example of using proc freq to conduct a chi-square test of independence.
And if you want the results in an output data set:
Exercises:
Use your permanent usedcars dataset as the source for the following problems. Use your saved formats whenever you are asked to use your formats from the previous lesson.
1) Using proc means:
a) Display the mean and standard deviation (only these two statistics) for the miles variable in the output window.
b) Display the mean and median of the price, but using color as a class variable.
c) Use your formats for color and miles and display the mean and median prices, with both color and miles as class variables, in their formatted form.
d) Produce an output data set that gives the mean and standard deviation of the miles for each make of car, using a by statement. Print the result.
e) Produce an output data set that gives the mean and standard deviation of the price for all the cars and for each make of car, using class and types statements.
2. Using proc freq:
a) Display a frequency table of the makes.
b) Display a two-way table of color by make, showing only the counts in each cell, and include tests of independence. Align colors vertically and makes horizontally in the table.
c) Use the format for classifying miles that you created in Lesson 9 to make a table of make by mile-groups. Align makes vertically and miles horizontally. Print the counts and row percents (for each make, percent in a mile-group).
Copyright reserved by Dr. Dwight Galster, 2006. Please request permission for reprints (other than for personal use) from dwight.galster@sdstate.edu . "SAS" is a registered trade name of SAS Institute, Cary North Carolina. All other trade names mentioned are the property of their respective owners. This document is a work in progress and comments are welcome. Please send an email if you find it useful or if your site links to it.