Search⌘ K
AI Features

Course Overview

Explore the fundamentals of Apache Kafka including its architecture, client communication, and key features. Learn how Kafka supports real-time data streaming, event processing, and scalable data pipelines to build robust data-intensive applications.

Welcome to this course on Apache Kafka!

What is Kafka?

Apache Kafka is an open-source software platform written in the Scala and Java programming languages. Kafka started in 2011 as a messaging system for LinkedIn but has since grown to become a popular distributed event streaming platform. The platform is capable of handling trillions of records per day.

Kafka is a distributed system comprised of servers and clients that communicate through a TCP network protocol. The system allows us to read, write, store, and process events. We can think of an event as an independent piece of information that needs to be relayed from a producer to a consumer. Some relevant examples of this include Amazon payment transactions, iPhone location updates, FedEx shipping orders, and much more. Kafka is primarily used for building data pipelines and implementing streaming solutions.

Kafka allows us to build apps that can constantly and accurately consume and process multiple streams at very high speeds. It works with streaming data from thousands of different data sources. With Kafka, we can:

  • Process records as they occur

  • Store records accurately and consistently

  • Publish or subscribe to data or event streams

The Kafka publish-subscribe messaging system is extremely popular in the Big Data scene and integrates well with Apache Spark and Apache Storm.

Kafka use cases

You can use Kafka in many different ways, but here are some examples of different use cases shared on the official Kafka site:

  • Processing financial transactions in real-time

  • Tracking and monitoring transportation vehicles in real-time

  • Capturing and analyzing sensor data

  • Collecting and reacting to customer interactions

  • Monitoring hospital patients

  • Providing a foundation for data platforms, event-driven architectures, and microservices

  • Performing large-scale messaging

  • Serving as a commit-log for distributed systems

  • And much more

Key features of Kafka

Let’s take a look at some of the key features that make Kafka so popular:

  • Scalability: Kafka manages scalability in event connectors, consumers, producers, and processors.

  • Fault tolerance: Kafka is fault-tolerant and easily handles failures with masters and databases.

  • Consistent: Kafka can scale across many different servers and still maintain the ordering of your data.

  • High performance: Kafka has high throughput and low latency. It remains stable even when working with a multitude of data.

  • Extensibility: Many different applications have integrations with Kafka.

  • Replication capabilities: Kafka uses ingest pipelines and can easily replicate events.

  • Availability: Kafka can stretch clusters over availability zones or connect different clusters across different regions. Kafka uses ZooKeeper to manage clusters.

  • Connectivity: The Kafka Connect interface allows you to integrate with many different event sources such as JMS and AWS S3.

  • Community: Kafka is one of the most active projects in the Apache Software Foundation. The community holds events like the Kafka Summit by Confluent.

Target audience

This course adopts a hands-on approach to learning Kafka. Along with core Kafka fundamentals, it will also cover the ecosystem of projects (Kafka Streams, Kafka Connect, etc.) whose knowledge is critical to building end-to-end solutions.

This is a course for software developers, data engineers, and other data professionals who want to learn Kafka to build data-intensive applications. It will prove helpful for anyone who wants to learn Kafka with a practical, hands-on approach using first-class programming languages like Java, instead of being limited to a CLI.

Prerequisites

Some programming experience with Java is preferable because we will use Java libraries to interact with Kafka.

Course contents

This course covers the following topics that will help build a solid foundation for Kafka:

  • An overview of the Kafka architecture, client libraries, and its ecosystem of projects

  • Hands-on examples of using Kafka Client APIs (Producer, Consumer, Admin), along with key configurations

  • Developing stream processing applications using Kafka Streams (with the Processor and DSL APIs), querying their state using Interactive Queries, and how to test them

  • Using Kafka Connect source and sink connectors to build scalable data pipelines

  • Diving into key Kafka-related projects in addition to the core ecosystem, including the Spring Framework and Schema Registry

  • Best practices for each covered topic

Course structure and demo applications

Each lesson consists of exercises in the form of quizzes and coding challenges to reinforce concepts. The course also has a project assignment to apply the skills you have learned.

Some of the practical demonstrations covered in this course include:

  • Using the Kafka Producer, Consumer, and Admin APIs

  • Using the Kafka Streams DSL and Processor APIs to process real-time data flowing through Kafka topics

  • Using Kafka Connect source and sink connectors to build data pipelines to connect heterogeneous systems

  • Testing Kafka Streams applications

  • Using Kafka with the Spring Framework