Skip to content

Streams

A stream is a durable, partitioned sequence of immutable events. When a new event is added a stream, it's appended to the partition that its key belongs to. Streams are useful for modeling a historical sequence of activity. For example, you might use a stream to model a series of customer purchases or a sequence of readings from a sensor. Under the hood, streams are simply stored as Apache Kafka® topics with an enforced schema. You can create a stream from scratch or declare a stream on top of an existing Kafka topic. In both cases, you can specify a variety of configuration options.

Create a stream from scratch

When you create a stream from scratch, a backing Kafka topic is created automatically. Use the CREATE STREAM statement to create a stream from scratch, and give it a name, schema, and configuration options. The following statement registers a publications stream on a topic named publication_events. Events in the publications stream are distributed over 3 partitions, are keyed on the author column, and are serialized in the Avro format.

1
2
3
4
5
CREATE STREAM publications (author VARCHAR, title VARCHAR)
    WITH (kafka_topic = 'publication_events',
          partitions = 3,
          key = 'author',
          value_format = 'avro');

In this example, a new stream named publications is created with two columns: author and title. Both are of type VARCHAR. ksqlDB automatically creates an underlying publication_events topic that you can access freely. The topic has 3 partitions, and any new events that are appended to the stream are hashed according to the value of the author column. Because Kafka can store data in a variety of formats, we let ksqlDB know that we want the value portion of each row stored in the Avro format. You can use a variety of configuration options in the final WITH clause.

Note

If you create a stream from scratch, you must supply the number of partitions.

Create a stream over an existing Kafka topic

You can also create a stream on top of an existing Kafka topic. Internally, ksqlDB simply registers the topic with the provided schema and doesn't create anything new.

1
2
3
CREATE STREAM publications (author VARCHAR, title VARCHAR)
    WITH (kafka_topic = 'publication_events',
          value_format = 'avro');

Because the topic already exists, you can't specify the number of partitions. The key shouldn't be set here either, because any data that already exists in the same topic has a given key.

If an underlying event in the Kafka topic doesn’t conform to the given stream schema, the event is discarded at read-time.

Page last revised on: 2020-04-29


Last update: 2020-04-29