Syllabus of Big Data Hadoop Analyst Training
Module 1: Introduction to Big Data and Hadoop
- Understanding Big Data
- Types of Big Data
- Difference between Traditional Data and Big Data
- Introduction to Hadoop
- Distributed Data Storage In Hadoop, HDFS and Hbase
- Hadoop Data processing Analyzing Services MapReduce and spark, Hive Pig and Storm
- Data Integration Tools in Hadoop
- Resource Management and cluster management Services
Module 2: Big Data Ecosystem
- Need of Hadoop in Big Data
- Understanding Hadoop And Its Architecture
- The MapReduce Framework
- What is YARN?
- Understanding Big Data Components
- Monitoring, Management and Orchestration Components of Hadoop Ecosystem
- Different Distributions of Hadoop
- Installing Hadoop 3
Module 3: Hadoop Cluster Configuration
- Hortonworks sandbox installation & configuration
- Hadoop Configuration files
- Working with Hadoop services using Ambari
- Hadoop Daemons
- Browsing Hadoop UI consoles
- Basic Hadoop Shell commands
- Eclipse & winscp installation & configurations on VM
Module 4: Big Data Processing with MapReduce
- Running a MapReduce application in MR2
- MapReduce Framework on YARN
- Fault tolerance in YARN
- Map, Reduce & Shuffle phases
- Understanding Mapper, Reducer & Driver classes
- Writing MapReduce WordCount program
- Executing & monitoring a Map Reduce job
Module 5: Batch Analytics with Apache Spark
- SparkSQL and DataFrames
- DataFrames and the SQL API
- DataFrame schema
- Datasets and encoders
- Loading and saving data
- Aggregations
- Joins
Module 6: Real Time Analytics with Apache Spark
- A short introduction to streaming
- Spark Streaming
- Discretized Streams
- Stateful and stateless transformations
- Checkpointing
- Operating with other streaming platforms (such as Apache Kafka)
- Structured Streaming
Module 7: Analysis using Pig
- Background of Pig
- Pig architecture
- Pig Latin basics
- Pig execution modes
- Pig processing – loading and transforming data
- Pig built-in functions
- Filtering, grouping, sorting data
- Relational join operators
- Pig Scripting
- Pig UDF's
Module 8: Analysis using Hive Data Warehousing Infrastructure
- Background of Hive
- Hive architecture
- Hive Query Language
- Derby to MySQL database
- Managed & external tables
- Data processing – loading data into tables
- Hive Query Language
- Using Hive built-in functions
- Partitioning data using Hive
- Bucketing data
- Hive Scripting
- Using Hive UDF's
Module 9: Working with HBase
- HBase overview
- Data model
- HBase architecture
- HBase shell
- Zookeeper & its role in HBase environment
- HBase Shell environment
- Creating table
- CLI commands – get, put, delete & scan
- Scan Filter operations
Module 10: Importing and Exporting Data using Sqoop
- Importing data from RDBMS to HDFS
- IExporting data from HDFS to RDBMS
- IImporting & exporting data between RDBMS & Hive tables
Module 11: Oozie Workflow Management and Using Flume for Analyzing Streaming Data
- Overview of Oozie
- Oozie Workflow Architecture
- Creating workflows with Oozie
- Introduction to Flume
- Flume Architecture
- Flume Demo
Module 12: Visualizing Big Data
- Introduction
- Tableau
- Chart types
- Data visualization tools
Module 13: Introducing Cloud Computing
- Cloud computing basics
- Concepts and terminology
- Goals and benefits
- Risks and challenges
- Roles and boundaries
- Cloud characteristics
- Cloud delivery models
- Cloud deployment models