Streams
A stream is a durable, partitioned sequence of immutable events. When a new event is added a stream, it's appended to the partition that its key belongs to. Streams are useful for modeling a historical sequence of activity. For example, you might use a stream to model a series of customer purchases or a sequence of readings from a sensor. Under the hood, streams are simply stored as Apache Kafka® topics with an enforced schema. You can create a stream from scratch or declare a stream on top of an existing Kafka topic. In both cases, you can specify a variety of configuration options.
Create a stream from scratch¶
When you create a stream from scratch, a backing Kafka topic is created
automatically. Use the CREATE STREAM statement to create a stream from scratch,
and give it a name, schema, and configuration options. The following statement
registers a publications
stream on a topic named publication_events
. Events
in the publications
stream are distributed over 3 partitions, are keyed on
the author
column, and are serialized in the Avro format.
1 2 3 4 5 6 7 8 |
|
In this example, a new stream named publications
is created with two columns:
author
and title
. Both are of type VARCHAR
. ksqlDB automatically creates
an underlying publication_events
topic that you can access freely. The topic
has 3 partitions, and any new events that are appended to the stream are hashed
according to the value of the author
column. Because Kafka can store
data in a variety of formats, we let ksqlDB know that we want the value portion
of each row stored in the Avro format. You can use a variety of configuration
options in the final WITH
clause.
Note
If you create a stream from scratch, you must supply the number of partitions.
Create a stream over an existing Kafka topic¶
You can also create a stream on top of an existing Kafka topic. Internally, ksqlDB simply registers the topic with the provided schema and doesn't create anything new.
1 2 3 4 5 6 7 |
|
Because the topic already exists, you do not need to specify the number of partitions.
It's important that the columns you define match the data in the existing topic.
In this case, the message would need a KAFKA
serialized VARCHAR
in the message key
and an AVRO
serialized record containing a title
field in the message value.
If both the author
and title
columns are in the message value, you can write:
1 2 3 4 5 6 7 |
|
Notice the author
column is no longer marked with the KEY
keyword, so it is now
read from the message value.
If an underlying event in the Kafka topic doesn’t conform to the given stream schema, the event is discarded at read-time, and an error is added to the processing log.