A stream is a durable, partitioned sequence of immutable events. When a new event is added a stream, it's appended to the partition that its key belongs to. Streams are useful for modeling a historical sequence of activity. For example, you might use a stream to model a series of customer purchases or a sequence of readings from a sensor. Under the hood, streams are simply stored as Apache Kafka® topics with an enforced schema. You can create a stream from scratch or declare a stream on top of an existing Kafka topic. In both cases, you can specify a variety of configuration options.
Create a stream from scratch¶
When you create a stream from scratch, a backing Kafka topic is created
automatically. Use the CREATE STREAM statement to create a stream from scratch,
and give it a name, schema, and configuration options. The following statement
publications stream on a topic named
publications stream are distributed over 3 partitions, are keyed on
author column, and are serialized in the Avro format.
1 2 3 4 5
In this example, a new stream named
publications is created with two columns:
title. Both are of type
VARCHAR. ksqlDB automatically creates
publication_events topic that you can access freely. The topic
has 3 partitions, and any new events that are appended to the stream are hashed
according to the value of the
author column. Because Kafka can store
data in a variety of formats, we let ksqlDB know that we want the value portion
of each row stored in the Avro format. You can use a variety of configuration
options in the final
If you create a stream from scratch, you must supply the number of partitions.
Create a stream over an existing Kafka topic¶
You can also create a stream on top of an existing Kafka topic. Internally, ksqlDB simply registers the topic with the provided schema and doesn't create anything new.
1 2 3
Because the topic already exists, you can't specify the number of partitions. The key shouldn't be set here either, because any data that already exists in the same topic has a given key.
If an underlying event in the Kafka topic doesn’t conform to the given stream schema, the event is discarded at read-time.
Page last revised on: 2020-04-29