Hadoop Training

Hadoop Developer Training Outline

Introduction

Hadoop history and concepts
Ecosystem
Distributions
High level architecture
Hadoop myths
Hadoop challenges (hardware / software)

HDFS

Concepts (horizontal scaling, replication, data locality, rack awareness)
Architecture
Namenode (function, storage, file system meta-data, and block reports)
Secondary namenode
HA Standby namenode
Data node
Communications / heart-beats
Block manager / balancer
Health check / safemode
read / write path
Navigating HDFS UI
Command-line interaction with HDFS
File systems abstractions
WebHDFS
Reading / writing files using Java API
Getting Data into / out of HDFS (Flume, Sqoop)
Getting HDFS stats
Latest in HDFS
Namenode HA and Federation
HDFS roadmap

MapReduce

Parallel computing before MapReduce
MapReduce concepts
Daemons: jobtracker / tasktracker
Phases: driver, mapper, shuffle/sort, and reducer
First MapReduce job
MapReduce UI walk through
Counters
Distributed cache
Combiners
Partitioners
MapReduce configuration
Job config
MR types and formats
Sorting
Job schedulers
MapReduce best practices
MRUnit
Optimizing MapReduce
Fool proofing MR
Thinking in MapReduce
YARN: architecture and use

Pig

Intro: principles and uses cases
Pig versus MapReduce

Hive

Intro: principles and uses cases
Environment and configuration
Hive tables and metadata
Hive keywords

HBase

History and concepts
Architecture
HBase versus RDBMS
HBase shell
HBase Java API
Splits and compaction
Read path / write path
Schema design

Real world Big Data skills and a hackathon

NoSQL design patterns: going from SQL to NoSQL
Smart Meter data collection with Flume
Sinks into HDFS and HBase
Analyzing smart meter data with Pig and Hive
Smart meter analytics with Mahout
Scheduling complete workflow with Oozie

Hadoop Administration Training Outline

Introduction

Hadoop history and concepts
Ecosystem
Distributions
High level architecture
Hadoop myths
Hadoop challenges (hardware / software)

Planning and installation

Selecting software and Hadoop distributions
Sizing the cluster and planning for growth
Selecting hardware and network
Rack topology
Installation
Multi-tenancy
Directory structure and logs
Benchmarking

HDFS operations

Concepts (horizontal scaling, replication, data locality, rack awareness)
Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
Health monitoring
Command-line and browser-based administration
Adding storage and replacing defective drives

MapReduce operations

Parallel computing before MapReduce: compare HPC versus Hadoop administration
MapReduce cluster loads
Nodes and Daemons (JobTracker, TaskTracker)
MapReduce UI walk through
MapReduce configuration
Job config
Job schedulers
Administrator view of MapReduce best practices
Optimizing MapReduce
Fool proofing MR: what to tell your programmers
YARN: architecture and use

Advanced topics

Hardware monitoring
System software monitoring
Hadoop cluster monitoring
Adding and removing servers and upgrading Hadoop
Backup, recovery, and business continuity plann
Cluster configuration tweaks
Hardware maintenance schedule
Oozie scheduling for administrators
Securing your cluster with Kerberos
The future of Hadoop

Hadoop and SQL Training Outline

Introduction

The Concepts of Hadoop
The Basics of SQL
The WHERE Clause
Distinct, Group By, Limit and Sample
Aggregation
Join Functions
Sub-query Functions
Date Functions
OLAP Functions
Temporary Tables
Strings
Interrogating the Data
View Functions
Creating Databases and Tables
Data Manipulation Language (DML)
Statistical Aggregate Functions
Hadoop EXPLAIN
Conclusion

Address

2292 Walsh Ave
Santa Clara, CA 95050
650 608 5234
info@bitranetinc.com

Useful Links

About BitraNet Inc

Bitranet Inc is a fast growing US based IT Software development & Staff augmentation firm, We also have development centers in Schaumburg (Chicago), IL and Hyderabad, India. Bitranet established in 1996

© 2025 BitraNet Inc, All Rights Reserved.