SQL for SAS Programmers - Introduction

April 12, 2009 by: Blink 7

What is SQL?

SQL stands for Structured Query Language and was designed for development and maintenance within a Database Management System (DBMS). A DBMS consists of one or more tables of data, typically joined in a hierarchical fashion, and a series of programs for organizing the data.

Typical tasks performed with SQL code include the following:

  • Retrieve (or query) data from one or more data tables
  • Manipulate data within existing tables
  • Define new tables and create data within new table
  • Alter existing table definitions
  • Set permissions for different users to access existing tables

The original implementation of SQL was developed by IBM to manipulate data within its System R. While the language has since been standardized by ANSI and the ISO, most major DBMS’s extend SQL’s query-based syntax to include procedural language constructs like conditional statements and variable storage. The extensions tend to be specific to the DBMS, resulting in several sub-languages like PL/SQL (Oracle) and T-SQL (Microsoft SQL Server).

SAS implements SQL using the PROC SQL statement. PROC SQL can be used as an alternative to the DATA step for creating and manipulating data sets, where each SAS data set is treated as a table. Most of the commands available within the DATA step can be used in PROC SQL, while the statement can be used to control macro variables and screen output.

Why Learn SQL when SAS has the DATA Step?

Most SAS programmers are already familiar with the DATA step - arguably the most important statement in the SAS language. So, why learn another statement that performs the same tasks? SQL offers many advantages over the DATA step and other statements, including the following:

  • Interoperability. PROC SQL can be used to retrieve data directly from sources that would be otherwise unavailable to the DATA Step. For instance, The Oracle PIPE driver can be used by PROC SQL to issue pass-through statements directly to an Oracle database (via Oracle SQL syntax) and retrieve the results to the SAS environment.
  • Powerful data merging facilities. The SQL programming language does not require data sets to be sorted before merging. The syntax for performing the various merges (Inner Join, Outer Join, Left Join, Right Join) is relatively simple compared to the DATA step
  • Simplified reporting. A single SQL statement can retrieve data from one or more data sets, summarize the data and output the results to the screen. To perform the same task would require at least one DATA Step and a PROC PRINT statement.

Whether PROC SQL is superior to the DATA step or vice-versa is a religious debate that rages on, even among SAS instructors during training classes. In practice, I’ve found that each statement excels in different situations and that most complex projects involve frequent use of both.

What potential Benefits can SQL provide for the SAS Universe?

The PROC SQL command provides an easy entry point for the large number of SQL programmers who are unfamiliar with SAS. While SAS is generally used as a statistical tool, there exist very powerful facilities that would make SAS an attractive platform for other database-related applications such as ETL and main-frame processing.

Similarly, learning SQL can benefit the SAS-only programmer by introducing a language that is almost always in demand outside the SAS universe and for which jobs are readily available. Although exact implementations of SQL differ between the various databases, the core commands remain consistent across nearly all databases.

The Project

The following tutorial will be based on generated data for a fictional electronics store. The main data set / table consists of 10 days of transaction data containing the following pieces of information:

  • Unique transaction ID
  • Date of transaction
  • ID of employee who made the sale
  • Payment method
  • Transaction amount

Keeping with DBMS recommended practices, employee and payment method information are stored in the transaction tables as numeric ID’s. Two more reference tables have been created to provide additional details about each employee and method of payment.

Tutorial Index

The SAS PROC SQL Tutorial consists of the following sections –

Part 1 – Extracting Data
Part 2 – Manipulating Data (coming soon!)
Part 3 – Combining Data (coming soon!)
Part 4 – Advanced Techniques (coming soon!)

Filed under: Development, Tutorials
Tags: ,

Leave a Reply

Captcha
Enter the letters you see above.