Tutorials and Examples
- ksqlDB Examples
- Write Streaming Queries Against Apache Kafka® Using ksqlDB (Docker)
- Clickstream Data Analysis Pipeline Using ksqlDB (Docker)
- ksqlDB with Embedded Connect
- Integrating with PostgreSQL
- Blog post: Building a Materialized Cache with ksqlDB
This tutorial demonstrates a simple workflow using ksqlDB to write streaming queries against messages in Apache Kafka®.
Write Streaming Queries with the ksqlDB CLI¶
= _ _ ____ ____ =
= | | _____ __ _| | _ \| __ ) =
= | |/ / __|/ _` | | | | | _ \ =
= | <\__ \ (_| | | |_| | |_) | =
= |_|\_\___/\__, |_|____/|____/ =
= |_| =
= Event Streaming Database purpose-built =
= for stream processing apps =
Copyright 2017-2019 Confluent Inc.
CLI v0.8.0, Server v0.8.0 located at http://ksqldb-server:8088
Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!
To get started with the ksqlDB CLI, see Write Streaming Queries Against Apache Kafka® Using KSQL (Docker).
Write Streaming Queries with ksqlDB and Confluent Control Center¶
Get started with ksqlDB and Confluent Control Center:
Stream Processing Cookbook¶
The Stream Processing Cookbook contains ksqlDB recipes that provide in-depth tutorials and recommended deployment scenarios.
Clickstream Data Analysis Pipeline¶
Clickstream analysis is the process of collecting, analyzing, and reporting aggregate data about which pages a website visitor visits and in what order. The path the visitor takes though a website is called the clickstream.
This tutorial focuses on building real-time analytics of users to determine:
- General website analytics, such as hit count and visitors
- Bandwidth use
- Mapping user-IP addresses to actual users and their location
- Detection of high-bandwidth user sessions
- Error-code occurrence and enrichment
- Sessionization to track user-sessions and understand behavior (such as per-user-session-bandwidth, per-user-session-hits etc)
The tutorial uses standard streaming functions (i.e., min, max, etc) and enrichment using child tables, stream-table join, and different types of windowing functionality.
Get started now with these instructions:
If you don't have Docker, you can also run an automated version of the Clickstream tutorial designed for local Confluent Platform installs. Running the Clickstream demo locally without Docker requires that you have Confluent Platform installed locally, along with Elasticsearch and Grafana.
ksqlDB with Embedded Connect¶
ksqlDB has native integration with Connect. While ksqlDB can integrate with a separate Kafka Connect cluster, it can also run Connect embedded within the ksqlDB server, making it unnecessary to run a separate Connect cluster. The embedded Connect tutorial shows how you can configure ksqlDB to run Connect in embedded mode.
These examples provide common ksqlDB usage operations.
You can configure Java streams applications to deserialize and ingest data in multiple ways, including Kafka console producers, JDBC source connectors, and Java client producers. For full code examples,
ksqlDB in a Kafka Streaming ETL¶
To learn how to deploy a Kafka streaming ETL using ksqlDB for stream processing, you can run the Confluent Platform demo. All components in the Confluent Platform demo have encryption, authentication, and authorization configured end-to-end.
Level Up Your KSQL Videos¶
|Intro to Kafka stream processing, with a focus on KSQL.
|KSQL Use Cases
|Describes several KSQL uses cases, like data exploration, arbitrary filtering, streaming ETL, anomaly detection, and real-time monitoring.
|KSQL and Core Kafka
|Describes KSQL dependency on core Kafka, relating KSQL to clients, and describes how KSQL uses Kafka topics.
|Installing and Running KSQL
|How to get KSQL, configure and start the KSQL server, and syntax basics.
|KSQL Streams and Tables
|Explains the difference between a STREAM and TABLE, shows a detailed example, and explains how streaming queries are unbounded.
|Reading Kafka Data from KSQL
|How to explore Kafka topic data, create a STREAM or TABLE from a Kafka topic, identify fields. Also explains metadata like ROWTIME and TIMESTAMP, and covers different formats like Avro, JSON, and Delimited.
|Streaming and Unbounded Data in KSQL
|More detail on streaming queries, how to read topics from the beginning, the differences between persistent and non-persistent queries, how do streaming queries end.
|Enriching data with KSQL
|Scalar functions, changing field types, filtering data, merging data with JOIN, and rekeying streams.
|Aggregations in KSQL
|How to aggregate data with KSQL, different types of aggregate functions like COUNT, SUM, MAX, MIN, TOPK, etc, and windowing and late-arriving data.
|Taking KSQL to Production
|How to use KSQL in streaming ETL pipelines, scale query processing, isolate workloads, and secure your entire deployment.
|A brief tutorial on how to use INSERT INTO in KSQL by Confluent.
|Struct (Nested Data)
|A brief tutorial on how to use STRUCT in KSQL by Confluent.
|A short tutorial on stream-stream joins in KSQL by Confluent.
|A short tutorial on table-table joins in KSQL by Confluent.
|Monitoring KSQL in Confluent Control Center
|Monitor performance and end-to-end message delivery of your KSQL queries.