Hadoop Training

Hadoop Developer Training Outline

    Introduction
  • Hadoop history and concepts
  • Ecosystem
  • Distributions
  • High level architecture
  • Hadoop myths
  • Hadoop challenges (hardware / software)
    HDFS
  • Concepts (horizontal scaling, replication, data locality, rack awareness)
  • Architecture
  • Namenode (function, storage, file system meta-data, and block reports)
  • Secondary namenode
  • HA Standby namenode
  • Data node
  • Communications / heart-beats
  • Block manager / balancer
  • Health check / safemode
  • read / write path
  • Navigating HDFS UI
  • Command-line interaction with HDFS
  • File systems abstractions
  • WebHDFS
  • Reading / writing files using Java API
  • Getting Data into / out of HDFS (Flume, Sqoop)
  • Getting HDFS stats
  • Latest in HDFS
  • Namenode HA and Federation
  • HDFS roadmap
    MapReduce
  • Parallel computing before MapReduce
  • MapReduce concepts
  • Daemons: jobtracker / tasktracker
  • Phases: driver, mapper, shuffle/sort, and reducer
  • First MapReduce job
  • MapReduce UI walk through
  • Counters
  • Distributed cache
  • Combiners
  • Partitioners
  • MapReduce configuration
  • Job config
  • MR types and formats
  • Sorting
  • Job schedulers
  • MapReduce best practices
  • MRUnit
  • Optimizing MapReduce
  • Fool proofing MR
  • Thinking in MapReduce
  • YARN: architecture and use
    Pig
  • Intro: principles and uses cases
  • Pig versus MapReduce
    Hive
  • Intro: principles and uses cases
  • Environment and configuration
  • Hive tables and metadata
  • Hive keywords
    HBase
  • History and concepts
  • Architecture
  • HBase versus RDBMS
  • HBase shell
  • HBase Java API
  • Splits and compaction
  • Read path / write path
  • Schema design
    Real world Big Data skills and a hackathon
  • NoSQL design patterns: going from SQL to NoSQL
  • Smart Meter data collection with Flume
  • Sinks into HDFS and HBase
  • Analyzing smart meter data with Pig and Hive
  • Smart meter analytics with Mahout
  • Scheduling complete workflow with Oozie

Hadoop Administration Training Outline

    Introduction
  • Hadoop history and concepts
  • Ecosystem
  • Distributions
  • High level architecture
  • Hadoop myths
  • Hadoop challenges (hardware / software)
    Planning and installation
  • Selecting software and Hadoop distributions
  • Sizing the cluster and planning for growth
  • Selecting hardware and network
  • Rack topology
  • Installation
  • Multi-tenancy
  • Directory structure and logs
  • Benchmarking
    HDFS operations
  • Concepts (horizontal scaling, replication, data locality, rack awareness)
  • Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
  • Health monitoring
  • Command-line and browser-based administration
  • Adding storage and replacing defective drives
    MapReduce operations
  • Parallel computing before MapReduce: compare HPC versus Hadoop administration
  • MapReduce cluster loads
  • Nodes and Daemons (JobTracker, TaskTracker)
  • MapReduce UI walk through
  • MapReduce configuration
  • Job config
  • Job schedulers
  • Administrator view of MapReduce best practices
  • Optimizing MapReduce
  • Fool proofing MR: what to tell your programmers
  • YARN: architecture and use

    Advanced topics
  • Hardware monitoring
  • System software monitoring
  • Hadoop cluster monitoring
  • Adding and removing servers and upgrading Hadoop
  • Backup, recovery, and business continuity plann
  • ing
  • Cluster configuration tweaks
  • Hardware maintenance schedule
  • Oozie scheduling for administrators
  • Securing your cluster with Kerberos
  • The future of Hadoop

Hadoop and SQL Training Outline

    Introduction
  • The Concepts of Hadoop
  • The Basics of SQL
  • The WHERE Clause
  • Distinct, Group By, Limit and Sample
  • Aggregation
  • Join Functions
  • Sub-query Functions
  • Date Functions
  • OLAP Functions
  • Temporary Tables
  • Strings
  • Interrogating the Data
  • View Functions
  • Creating Databases and Tables
  • Data Manipulation Language (DML)
  • Statistical Aggregate Functions
  • Hadoop EXPLAIN
  • Conclusion

Address

About BitraNet Inc

Bitranet Inc is a fast growing US based IT Software development & Staff augmentation firm, We also have development centers in Schaumburg (Chicago), IL and Hyderabad, India. Bitranet established in 1996