Introduction to SAS & Hadoop

Course Code:
SAS-DIACHD

Duration:
2 days
9:00am to 5.00pm
Course Fees:
S$2,500 (excl of G.S.T)
2020 Course Dates
None of the published dates will work for you? Speak to our training consultants for a private tuition arrangement or a closed door training.

Course Overview

This course teaches you how to use SAS programming methods to read, write, and manipulate Hadoop data. Base SAS methods that are covered include reading and writing raw data with the DATA step and managing the Hadoop file system and executing MapReduce and Pig code from SAS via the HADOOP procedure. In addition, the SAS/ACCESS Interface to Hadoop methods that allow LIBNAME access and SQL pass-through techniques to read and write Hadoop Hive tables structures is part of this course. Although not covered in any detail, a brief overview of additional SAS and Hadoop technologies, including DS2, high-performance analytics, SAS LASR Server, and in- memory Statistics, as well as the computing infrastructure and data access methods that support these, is also part of this
course. This course is included in the Expert Exchange on Hadoop: Using SAS/ACCESS service offering to configure SAS/ACCESS Interface to Hadoop or SAS/ACCESS Interface to Impala to work with your Hadoop environment.

Program Objectives

• Read and write Hadoop files with the FILENAME statement

• Execute and use Hadoop commands with PROC HADOOP

• Invoke execution of MapReduce programs and Pig programs in Hadoop within a SAS program

• Access Hadoop distributions using the LIBNAME statement and the SQL pass-through facility

• Create and use SQL procedure pass-through queries

• Use options and efficiency techniques for optimizing data access performance

• Join data using the SQL procedure and the DATA step

• Use Base SAS procedures with Hadoop

• Write programs to create source data for SAS High Performance Analytics programs and execute SAS High Performance Analytics programs to analyze the data in parallel

• Write SAS programs to start up a SAS LASR server grid, load data into memory in parallel and process that data in parallel with the IMSTAT procedure

Course Outline

Module 1: Introduction

Module 2: Accessing HDFS and Invoking Hadoop Applications from SAS

Module 3: Using the SQL Pass-Through Facility

Module 4: Using the SAS/ACCESS LIBNAME Engine

Module 5: Partitioning and Clustering Hive Tables

Module 6: Overview of SAS In-Memory Analytics and the Code Accelerator for Hadoop

Click Here for full course Outline

Take the Next Step

It Takes Less Than 5 Min