SQL for SAS Programmers - Part 1 - Extracting Data
April 14, 2009 by: Blink 7The first part of this tutorial deals with using the PROC SQL statement to perform basic data extraction. Screenshots of the code and output are included. Readers who wish to follow along on their own systems or copy the code can download the files provided below (right click and select “save as” or “save link as”):
(download) SAS Code for Tutorial Part 1
(download) SAS Data: Transactions
(download) SAS Data: Payment Types
(download) SAS Data: Staff
Note: This tutorial uses SAS terms and SQL terms interchangeably. The following sets of terms are equivalent:
- Variable, column
- Data set, table
Reading a Data Set
SQL defines the SELECT command to extract data from an existing table. The FROM command specifies the source table.
Example: Read and output all available information on the store staff –

Using the * argument with SELECT command automatically selects all the columns available in the table. By default, PROC SQL outputs to the screen, so no additional PROC PRINT is required to view the result.

The output is equivalent to the result of using a SET command inside a DATA step without any conditional statements.
Reading Specific Variables from a Data Set
To query specific columns from a source table, list the variables as comma-delimited arguments after the SELECT command
Example: Read and output first and last names of the store staff

In SQL, column names are listed separated by commas, whereas in the DATA step column names (variables) are separated by spaces.

The output is equivalent to the result of issuing SET and KEEP commands inside a DATA step.
Filtering a Data Set
The WHERE command can be used to filter individual records from a source table. This command is followed by one or more conditional arguments based on the variables available in the source data set.
Example: List staff with the last name “Wilson”

The simplest conditional statement matches one column to a literal value. Numeric data can also be compared using = but the literal value would not be enclosed in quotes.

The SQL WHERE command functions the same way as the WHERE command used in the DATA step
The WHERE command is not limited to simple comparisons. Multiple columns can be evaluated at a time.
Example: List staff in the media station with the last name “Wilson”

The Boolean operators AND and OR can be used, as well as parenthesis to create more complicated conditional logic.

The output shows the one employee with the last name “Wilson” who also works in the Media station. Once again, the operation of the SQL WHERE command is identical to the WHERE statement used in the DATA step.
The ^ symbol can be used to negate a logical condition within a WHERE command

The ^ symbol also be placed in front of an open parenthesis to negate more complicated conditional logic.

The output shows every employee not named “Wilson”
Combining the WHERE command with the IN command makes it possible compare a column to a list of possible values.
Example: List store employees in the Media or Computer stations

The IN command can be used with any valid data type (or even a sub-query producing one column of data) but lists may not mix data types.

The output lists all employees in the aforementioned stations.
Reading Data into a New Data Set
So far, all of the examples have output directly to the screen, which is PROC SQL’s default behaviour (in SAS EG, the output goes to the default ODS destination). In most cases, the desired target will be another table – often a brand new table. To redirect output from an SQL query to a new data set, use the CREATE TABLE / AS syntax at the beginning of the statement:
Example: Create a duplicate of the staff table

Virtually any SQL statement that produces screen output can be output directly to another table. To avoid system slow-downs, large SQL queries (e.g. queries that produce thousands of lines) should always be output to a new data set before viewing.

No output is produced on the screen, but a new table has been created in the WORK directory, containing an exact replica of the data in the source table.








