Learning Objectives: In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, Hadoop ecosystem components, Hadoop Architecture, HDFS, Rack Awareness, and Replication. You will learn about the Hadoop Cluster Architecture, important configuration files in a Hadoop Cluster. You will also get an introduction to Spark, why it is used and understanding of the difference between batch processing and real-time processing.
Topics:
- What is Big Data?
- Big Data Customer Scenarios.
- Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case.
- How Hadoop Solves the Big Data Problem?
- What is Hadoop?
- Hadoop’s Key Characteristics.
- Hadoop Ecosystem and HDFS.
- Hadoop Core Components.
- Rack Awareness and Block Replication.
- YARN and its Advantage.
- Hadoop Cluster and its Architecture.
- Hadoop: Different Cluster Modes.
- Big Data Analytics with Batch & Real-Time Processing.
- Why Spark is Needed?
- What is Spark?
- How Spark Differs from its Competitors?
- Spark at eBay.
- Spark’s Place in Hadoop Ecosystem.