Course Overview
Explore the fundamentals of Apache Kafka including its architecture, client communication, and key features. Learn how Kafka supports real-time data streaming, event processing, and scalable data pipelines to build robust data-intensive applications.
Welcome to this course on Apache Kafka!
What is Kafka?
Apache Kafka is an open-source software platform written in the Scala and Java programming languages. Kafka started in 2011 as a messaging system for LinkedIn but has since grown to become a popular distributed event streaming platform. The platform is capable of handling trillions of records per day.
Kafka is a distributed system comprised of servers and clients that communicate through a TCP network protocol. The system allows us to read, write, store, and process events. We can think of an event as an independent piece of information that needs to be relayed from a producer to a consumer. Some relevant examples of this include Amazon payment transactions, iPhone location updates, FedEx shipping orders, and much more. Kafka is primarily used for building data pipelines and implementing streaming solutions.
Kafka allows us to build apps that can constantly and accurately consume and process multiple streams at very high speeds. It works with streaming data from thousands of different data sources. With Kafka, we can:
Process records as they occur
Store records accurately and consistently
Publish or subscribe to data or event streams
The Kafka publish-subscribe messaging system is extremely popular in the Big Data scene and integrates well with Apache Spark and Apache Storm.
Kafka use cases
You can use Kafka in many different ways, but here are some examples of different use cases shared on the official Kafka site:
Processing financial transactions in real-time
Tracking and monitoring transportation vehicles in real-time
Capturing and analyzing sensor data
Collecting and reacting to customer interactions
Monitoring hospital patients
Providing a foundation for data platforms, event-driven architectures, and microservices
Performing large-scale messaging
Serving as a commit-log for distributed systems
And much more
Key features of Kafka
Let’s take a look at some of the key features that make Kafka so popular:
Scalability: Kafka manages scalability in event connectors, consumers, producers, and processors.
Fault tolerance: Kafka is fault-tolerant and easily handles failures with masters and databases.
Consistent: Kafka can scale across many different servers and still maintain the ordering of your data.
High performance: Kafka has high throughput and low latency. It remains stable even when working with a multitude of data.
Extensibility: Many different applications have integrations with Kafka.
Replication capabilities: Kafka uses ingest pipelines and can easily replicate events.
Availability: Kafka can stretch clusters over availability zones or connect different clusters across different regions. Kafka uses ZooKeeper to manage clusters.
Connectivity: The Kafka Connect interface allows you to integrate with many different event sources such as JMS and AWS S3.
Community: Kafka is one of the most active projects in the Apache Software Foundation. The community holds events like the Kafka Summit by Confluent.
Target audience
This course adopts a hands-on approach to learning Kafka. Along with core Kafka fundamentals, it will also cover the ecosystem of projects (Kafka Streams, Kafka Connect, etc.) whose knowledge is critical to building end-to-end solutions.
This is a course for software developers, data engineers, and other data professionals who want to learn Kafka to build data-intensive applications. It will prove helpful for anyone who wants to learn Kafka with a practical, hands-on approach using first-class programming languages like Java, instead of being limited to a CLI.
Prerequisites
Some programming experience with Java is preferable because we will use Java libraries to interact with Kafka.
Course contents
This course covers the following topics that will help build a solid foundation for Kafka:
An overview of the Kafka architecture, client libraries, and its ecosystem of projects
Hands-on examples of using Kafka Client APIs (Producer, Consumer, Admin), along with key configurations
Developing stream processing applications using Kafka Streams (with the Processor and DSL APIs), querying their state using Interactive Queries, and how to test them
Using Kafka Connect source and sink connectors to build scalable data pipelines
Diving into key Kafka-related projects in addition to the core ecosystem, including the Spring Framework and Schema Registry
Best practices for each covered topic
Course structure and demo applications
Each lesson consists of exercises in the form of quizzes and coding challenges to reinforce concepts. The course also has a project assignment to apply the skills you have learned.
Some of the practical demonstrations covered in this course include:
Using the Kafka Streams DSL and Processor APIs to process real-time data flowing through Kafka topics
Using Kafka Connect source and sink connectors to build data pipelines to connect heterogeneous systems
Testing Kafka Streams applications
Using Kafka with the Spring Framework