Streams
A stream is a durable, partitioned sequence of immutable events. When a new event is added a stream, it's appended to the partition that its key belongs to. Streams are useful for modeling a historical sequence of activity. For example, you might use a stream to model a series of customer purchases or a sequence of readings from a sensor. Under the hood, streams are simply stored as Apache Kafka® topics with an enforced schema. You can create a stream from scratch or declare a stream on top of an existing Kafka topic. In both cases, you can specify a variety of configuration options.
Create a stream from scratch¶
When you create a stream from scratch, a backing Kafka topic is created
automatically. Use the CREATE STREAM statement to create a stream from scratch,
and give it a name, schema, and configuration options. The following statement
registers a publications
stream on a topic named publication_events
. Events
in the publications
stream are distributed over 3 partitions, are keyed on
the author
column, and are serialized in the Avro format.
1 2 3 4 5 |
|
In this example, a new stream named publications
is created with two columns:
author
and title
. Both are of type VARCHAR
. ksqlDB automatically creates
an underlying publication_events
topic that you can access freely. The topic
has 3 partitions, and any new events that are appended to the stream are hashed
according to the value of the author
column. Because Kafka can store
data in a variety of formats, we let ksqlDB know that we want the value portion
of each row stored in the Avro format. You can use a variety of configuration
options in the final WITH
clause.
Note
If you create a stream from scratch, you must supply the number of partitions.
Create a stream over an existing Kafka topic¶
You can also create a stream on top of an existing Kafka topic. Internally, ksqlDB simply registers the topic with the provided schema and doesn't create anything new.
1 2 3 |
|
Because the topic already exists, you can't specify the number of partitions. The key shouldn't be set here either, because any data that already exists in the same topic has a given key.
If an underlying event in the Kafka topic doesn’t conform to the given stream schema, the event is discarded at read-time.
Page last revised on: 2020-04-29