Hadoop for Data Analysts Training Course

Primary tabs

Course Language

This course is delivered in English.

Course Code

68737

Duration Duration

14 hours (usually 2 days including breaks)

Course Outline Course Outline

Hadoop Fundamentals

  • The Motivation for Hadoop
  • Hadoop Overview
  • HDFS
  • MapReduce
  • The Hadoop Ecosystem
  • Lab Scenario Explanation
  • Hands-On Exercise: Data Ingest with Hadoop Tools

Introduction to Pig

  • What Is Pig?
  • Pig’s Features
  • Pig Use Cases
  • Interacting with Pig

Basic Data Analysis with Pig

  • Pig Latin Syntax
  • Loading Data
  • Simple Data Types
  • Field Definitions
  • Data Output
  • Viewing the Schema
  • Filtering and Sorting Data
  • Commonly-Used Functions
  • Hands-On Exercise: Using Pig for ETL Processing

Processing Complex Data with Pig

  • Storage Formats
  • Complex/Nested Data Types
  • Grouping
  • Built-in Functions for Complex Data
  • Iterating Grouped Data
  • Hands-On Exercise: Analyzing Ad Campaign

Data with Pig Multi-Dataset Operations with Pig

  • Techniques for Combining Data Sets
  • Joining Data Sets in Pig
  • Set Operations
  • Splitting Data Sets
  • Hands-On Exercise: Analyzing Disparate

Data Sets with Pig Extending Pig

  • Adding Flexibility with Parameters
  • Macros and Imports
  • UDFs
  • Contributed Functions
  • Using Other Languages to Process Data with Pig
  • Hands-On Exercise: Extending Pig with Streaming and UDFs

Pig Troubleshooting and Optimization

  • Troubleshooting Pig
  • Logging
  • Using Hadoop’s Web UI
  • Optional Demo: Troubleshooting a Failed Job with the Web UI
  • Data Sampling and Debugging
  • Performance Overview
  • Understanding the Execution Plan
  • Tips for Improving the Performance of Your Pig Jobs

Introduction to Hive

  • What Is Hive?
  • Hive Schema and Data Storage
  • Comparing Hive to Traditional Databases
  • Hive vs. Pig
  • Hive Use Cases
  • Interacting with Hive

Relational Data Analysis with Hive

  • Hive Databases and Tables
  • Basic HiveQL Syntax
  • Data Types
  • Joining Data Sets
  • Common Built-in Functions
  • Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue

Hive Data Management

  • Hive Data Formats
  • Creating Databases and Hive-Managed Tables
  • Loading Data into Hive
  • Altering Databases and Tables
  • Self-Managed Tables
  • Simplifying Queries with Views
  • Storing Query Results
  • Controlling Access to Data
  • Hands-On Exercise: Data Management with Hive

Text Processing with Hive

  • Overview of Text Processing
  • Important String Functions
  • Using Regular Expressions in Hive
  • Sentiment Analysis and N-Grams
  • Hands-On Exercise (Optional): Gaining Insight with Sentiment Analysis

Hive Optimization

  • Understanding Query Performance
  • Controlling Job Execution Plan
  • Partitioning
  • Bucketing
  • Indexing Data

Extending Hive

  • SerDes
  • Data Transformation with Custom Scripts
  • User-Defined Functions
  • Parameterized Queries
  • Hands-On Exercise: Data Transformation with Hive

Introduction to Impala

  • What is Impala?
  • How Impala Differs from Hive and Pig
  • How Impala Differs from Relational Databases
  • Limitations and Future Directions
  • Using the Impala Shell

Analyzing Data with Impala

  • Basic Syntax
  • Data Types
  • Filtering, Sorting, and Limiting Results
  • Joining and Grouping Data
  • Improving Impala Performance
  • Hands-On Exercise: Interactive Analysis with Impala

Choosing the Best Tool for the Job

  • Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
  • Which to Choose?

Guaranteed to run even with a single delegate!
Public Classroom Public Classroom
Participants from multiple organisations. Topics usually cannot be customised
From $5880
(113)
Private Classroom Private Classroom
Participants are from one organisation only. No external participants are allowed. Usually customised to a specific group, course topics are agreed between the client and the trainer.
From $5880
Request quote
Private Remote Private Remote
The instructor and the participants are in two different physical locations and communicate via the Internet
From $3580
Request quote

The more delegates, the greater the savings per delegate. Table reflects price per delegate and is used for illustration purposes only, actual prices may differ.

Number of Delegates Public Classroom Private Classroom Private Remote
1 $5880 $5880 $3580
2 $3340 $3265 $2115
3 $2493 $2393 $1627
4 $2070 $1958 $1383
Cannot find a suitable date? Choose Your Course Date >>
Too expensive? Suggest your price

Related Categories


Course Discounts

Course Venue Course Date Course Price [Remote/Classroom]
Forecasting with R Remote Course Tue, Aug 30 2016, 9:30 am $2450 / N/A
SQL Fundamentals Remote Course Fri, Sep 16 2016, 9:30 am $750 / N/A

Upcoming Courses

VenueCourse DateCourse Price [Remote/Classroom]
ON, Ottawa - Fairmont Chateau LaurierMon, Sep 12 2016, 9:30 am$3580 / $6280
ON, Ottawa – Albert & MetcalfeTue, Sep 13 2016, 9:30 am$3580 / $6180
NL, St. John's WestWed, Sep 14 2016, 9:30 am$3580 / $6580
AB, Edmonton - First Edmonton PlaceWed, Sep 14 2016, 9:30 am$3580 / $6280
BC, Victoria - The AtriumTue, Sep 20 2016, 9:30 am$3580 / $6280

Some of our clients