Skip to content

CREATE STREAM AS SELECT

Synopsis

1
2
3
4
5
6
7
8
CREATE STREAM stream_name
  [WITH ( property_name = expression [, ...] )]
  AS SELECT  select_expr [, ...]
  FROM from_stream
  [ LEFT | FULL | INNER ] JOIN [join_table | join_stream] [ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria 
  [ WHERE condition ]
  [PARTITION BY column_name]
  EMIT CHANGES;

Description

Create a new stream along with the corresponding Kafka topic, and continuously write the result of the SELECT query into the stream and its corresponding topic.

If the PARTITION BY clause is present, then the resulting stream will have the specified column as its key. The PARTITION BY clause is applied to the source after any JOIN or WHERE clauses, and before the SELECT clause, in much the same way as GROUP BY. For more information, see Partition Data to Enable Joins.

For joins, the key of the resulting stream will be the value from the column from the left stream that was used in the join criteria. This column will be registered as the key of the resulting stream if included in the selected columns.

For stream-table joins, the column used in the join criteria for the table must be the table key.

For stream-stream joins, you must specify a WITHIN clause for matching records that both occur within a specified time interval. For valid time units, see ksqlDB Time Units.

For more information, see Join Event Streams with ksqlDB.

The WITH clause for the result supports the following properties:

Property Description
KAFKA_TOPIC The name of the Kafka topic that backs this stream. If this property is not set, then the name of the stream in upper case will be used as default.
VALUE_FORMAT Specifies the serialization format of the message value in the topic. Supported formats: JSON, DELIMITED (comma-separated value), AVRO, KAFKA, and PROTOBUF. If this property is not set, the format of the input stream/table is used. For more information, see Serialization Formats.
VALUE_DELIMITER Used when VALUE_FORMAT='DELIMITED'. Supports single character to be a delimiter, defaults to ','. For space and tab delimited values you must use the special values 'SPACE' or 'TAB', not an actual space or tab character.
PARTITIONS The number of partitions in the backing topic. If this property is not set, then the number of partitions of the input stream/table will be used. In join queries, the property values are taken from the left-side stream or table. The ksql.sink.partitions property can be set in the properties file the ksqlDB Server is started with, or by using the SET statement.
REPLICAS The replication factor for the topic. If this property is not set, then the number of replicas of the input stream or table will be used. In join queries, the property values are taken from the left-side stream or table. The ksql.sink.replicas property can be set in the properties file the ksqlDB server is started with, or by using the SET statement.
TIMESTAMP Sets a field within this stream's schema to be used as the default source of ROWTIME for any downstream queries. Downstream queries that use time-based operations, such as windowing, will process records in this stream based on the timestamp in this field. By default, such queries will also use this field to set the timestamp on any records emitted to Kafka. Timestamps have a millisecond accuracy. If not supplied, the ROWTIME of the source stream is used.
Note: This doesn't affect the processing of the query that populates this stream. For example, given the following statement:
CREATE STREAM foo WITH (TIMESTAMP='t2') AS
SELECT * FROM bar
WINDOW TUMBLING (size 10 seconds);
EMIT CHANGES;
The window into which each row of bar is placed is determined by bar's ROWTIME, not t2.
TIMESTAMP_FORMAT Used in conjunction with TIMESTAMP. If not set, ksqlDB assumes that the timestamp field is a bigint. When set, the TIMESTAMP field must be of type varchar and have a format that can be parsed with the Java DateTimeFormatter. If your timestamp format has characters requiring single quotes, you can escape them with two successive single quotes, '', for example: 'yyyy-MM-dd''T''HH:mm:ssX'. For more information on timestamp formats, see DateTimeFormatter.
WRAP_SINGLE_VALUE Controls how values are serialized where the values schema contains only a single field. This setting controls how the query serializes values with a single-field schema.
If set to true, ksqlDB serializes the field as a named field within a record.
If set to false, ksqlDB serializes the field as an anonymous value.
If not supplied, the system default, defined by ksql.persistence.wrap.single.values and defaulting to true, is used.
Note: null values have special meaning in ksqlDB. Care should be taken when dealing with single-field schemas where the value can be null. For more information, see Single field (un)wrapping.
Note: Supplying this property for formats that do not support wrapping, for example DELIMITED, or when the value schema has multiple fields, results in an error.

Note

  • To use Avro, you must have Schema Registry enabled and ksql.schema.registry.url must be set in the ksqlDB server configuration file. See Configure ksqlDB for Avro or Protobuf.
  • Avro field names are not case sensitive in ksqlDB. This matches the ksqlDB column name behavior.

Note

The KEY property is not supported. Use PARTITION BY instead.

Example

TODO: example

Page last revised on: 2020-04-29


Last update: 2020-04-29