SQL for SAS Programmers - Part 1 - Extracting Data

April 14, 2009 by: Blink 7

The first part of this tutorial deals with using the PROC SQL statement to perform basic data extraction. Screenshots of the code and output are included. Readers who wish to follow along on their own systems or copy the code can download the files provided below (right click and select “save as” or “save link as”):

(download) SAS Code for Tutorial Part 1
(download) SAS Data: Transactions
(download) SAS Data: Payment Types
(download) SAS Data: Staff

Note: This tutorial uses SAS terms and SQL terms interchangeably. The following sets of terms are equivalent:

  • Variable, column
  • Data set, table

Reading a Data Set

SQL defines the SELECT command to extract data from an existing table. The FROM command specifies the source table.

Example: Read and output all available information on the store staff –

SAS PROC SQL: Read an entire data set and output to the screen

Using the * argument with SELECT command automatically selects all the columns available in the table. By default, PROC SQL outputs to the screen, so no additional PROC PRINT is required to view the result.

SAS Output: List of Store Employees

The output is equivalent to the result of using a SET command inside a DATA step without any conditional statements.

Reading Specific Variables from a Data Set

To query specific columns from a source table, list the variables as comma-delimited arguments after the SELECT command

Example: Read and output first and last names of the store staff

SAS PROC SQL: Read specific variables from a data set

In SQL, column names are listed separated by commas, whereas in the DATA step column names (variables) are separated by spaces.

SAS Output: First and Last Names of Store Employees

The output is equivalent to the result of issuing SET and KEEP commands inside a DATA step.

Filtering a Data Set

The WHERE command can be used to filter individual records from a source table. This command is followed by one or more conditional arguments based on the variables available in the source data set.

Example: List staff with the last name “Wilson”

SAS PROC SQL: Filter data results

The simplest conditional statement matches one column to a literal value. Numeric data can also be compared using = but the literal value would not be enclosed in quotes.

SAS Output: Store Employees with the Surname Wilson

The SQL WHERE command functions the same way as the WHERE command used in the DATA step

The WHERE command is not limited to simple comparisons. Multiple columns can be evaluated at a time.

Example: List staff in the media station with the last name “Wilson”

SAS PROC SQL: Filter data results by matching on multiple variables

The Boolean operators AND and OR can be used, as well as parenthesis to create more complicated conditional logic.

SAS Output: Store Employees in the Media station with the Surname Wilson

The output shows the one employee with the last name “Wilson” who also works in the Media station. Once again, the operation of the SQL WHERE command is identical to the WHERE statement used in the DATA step.

The ^ symbol can be used to negate a logical condition within a WHERE command

SAS PROC SQL: Filter data results using negative matching

The ^ symbol also be placed in front of an open parenthesis to negate more complicated conditional logic.

SAS Output: Store Employees that do NOT have the Surname Wilson

The output shows every employee not named “Wilson”

Combining the WHERE command with the IN command makes it possible compare a column to a list of possible values.

Example: List store employees in the Media or Computer stations

SAS PROC SQL: Filter data results using a list of values

The IN command can be used with any valid data type (or even a sub-query producing one column of data) but lists may not mix data types.

SAS Output: Store Employees in the Computer or Media station

The output lists all employees in the aforementioned stations.

Reading Data into a New Data Set

So far, all of the examples have output directly to the screen, which is PROC SQL’s default behaviour (in SAS EG, the output goes to the default ODS destination). In most cases, the desired target will be another table – often a brand new table. To redirect output from an SQL query to a new data set, use the CREATE TABLE / AS syntax at the beginning of the statement:

Example: Create a duplicate of the staff table

SAS PROC SQL: Read an entire data set and output to a new data se

Virtually any SQL statement that produces screen output can be output directly to another table. To avoid system slow-downs, large SQL queries (e.g. queries that produce thousands of lines) should always be output to a new data set before viewing.

SAS Data Set - Replica of Staff table

No output is produced on the screen, but a new table has been created in the WORK directory, containing an exact replica of the data in the source table.

<< Back to Start of Series

Filed under: Development, Tutorials
Tags: ,

Leave a Reply

Captcha
Enter the letters you see above.