Serialization
The term serialization format refers to the manner in which a record's raw bytes are translated to and from information structures that ksqlDB can understand at runtime. ksqlDB offers several mechanisms for controlling serialization and deserialization.
The primary mechanism is by choosing the serialization format when you
create a stream or table and specify FORMAT, KEY_FORMAT or VALUE_FORMAT in the WITH
clause.
| 1 2 3 4 5 6 7 8 9 |  | 
Serialization Formats¶
ksqlDB supports these serialization formats:
- NONEused to indicate the data should not be deserialized.
- DELIMITEDsupports comma separated values.
- JSONand- JSON_SRsupport JSON values, with and within schema registry integration
- AVROsupports AVRO serialized values.
- KAFKAsupports primitives serialized using the standard Kafka serializers.
- PROTOBUFsupports Protocol Buffers.
With the exception of the NONE format, all formats may be used as both key and value formats.
See individual formats for details.
NONE¶
| Feature | Supported | 
|---|---|
| As value format | No | 
| As key format | Yes | 
| Multi-Column Keys | N/A | 
| Schema Registry required | No | 
| Schema inference | No | 
| Single field wrapping | No | 
| Single field unwrapping | No | 
The NONE format is a special marker format that is used to indicate ksqlDB should not attempt to 
deserialize that part of the  Kafka record.
It's main use is as the KEY_FORMAT of key-less streams, especially where a default key format 
has been set, via ksql.persistence.default.format.key that supports Schema inference. If the
key format was not overridden, the server would attempt to load the key schema from the Schema Registry.
If the schema existed, the key columns would be inferred from the schema, which may not be the intent.
If the schema did not exist, the statement would be rejected.  In such situations, the key format can
be set to NONE: 
| 1 2 3 4 5 6 7 |  | 
Any statement that sets the key format to NONE and has key columns defined, will result in an error.
If a CREATE TABLE AS or CREATE STREAM AS statement has a source with a key format of NONE, but
the newly created table or stream has key columns, then you may either explicitly define the key 
format to use in the WITH clause, or the default key format, as set in ksql.persistence.default.format.key
will be used.
Conversely, a CREATE STREAM AS statement that removes the key columns, i.e. via PARTITION BY null
will automatically set the key format to NONE.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |  | 
DELIMITED¶
| Feature | Supported | 
|---|---|
| As value format | Yes | 
| As key format | Yes | 
| Multi-Column Keys | Yes | 
| Schema Registry required | No | 
| Schema inference | No | 
| Single field wrapping | No | 
| Single field unwrapping | Yes | 
The DELIMITED format supports comma-separated values. You can use other
delimiter characters by specifying the KEY_DELIMITER and/or VALUE_DELIMITER when you use
FORMAT='DELIMITED' in a WITH clause. Only a single character is valid
as a delimiter. The default is the comma character. For space- and
tab-delimited values, use the special values SPACE or TAB, not an actual
space or tab character. 
The delimiter is a Unicode character, as defined in java.lang.Character.
For example, the smiley-face character works:
| 1 |  | 
The serialized object should be a Kafka-serialized string, which will be split into columns.
For example, given a SQL statement such as:
| 1 |  | 
ksqlDB splits a key of 120,21 and a value of bob,49 into the four fields (two keys and two values) 
with ORGID KEY of 120, ID KEY of 21, NAME of bob and AGE of 49.
This data format supports all SQL
data types except ARRAY, MAP and
STRUCT. 
- TIMESTAMPtyped data is serialized as a- longvalue indicating the Unix epoch time in milliseconds.
- TIMEtyped data is serialized as an- intvalue indicating the number of milliseconds since the beginning of the day.
- DATEtyped data is serialized as an- intvalue indicating the number of days since the Unix epoch.
- BYTEStyped data is serialized as a Base64-encoded string value.
JSON¶
| Feature | Supported | 
|---|---|
| As value format | Yes | 
| As key format | JSON: Yes,JSON_SR: Yes | 
| Multi-Column Keys | Yes | 
| Schema Registry required | JSON: No,JSON_SR: Yes | 
| Schema inference | JSON: No,JSON_SR: Yes | 
| Single field unwrapping | Yes | 
There are two JSON formats, JSON and JSON_SR. Both support serializing and
deserializing JSON data. The latter offers integration with the Schema Registry,
registering and retrieving JSON schemas while the former does not. These two
formats are not byte compatible (you cannot read data produced by one by the
other).
The JSON formats supports all SQL data types.
By itself, JSON doesn't support a map type, so ksqlDB serializes MAP types as
JSON objects. For this reason, the JSON format  supports only MAP objects
that have STRING keys.
The serialized object should be a Kafka-serialized string that contains a valid JSON value. The format supports JSON objects and top-level primitives, arrays, and maps.
Important
If you want the sources that you create to store their schemas in
Schema Registry, specify the JSON_SR format.
JSON Objects¶
Values that are JSON objects are probably the most common.
For example, given a SQL statement such as:
| 1 |  | 
And a JSON value of:
| 1 2 3 4 5 |  | 
ksqlDB deserializes the JSON object's fields into the corresponding fields of the stream.
Top-level primitives, arrays and maps¶
The JSON format supports reading and writing top-level primitives, arrays and maps.
For example, given a SQL statement with only a single field in the
value schema and the WRAP_SINGLE_VALUE property set to false:
| 1 |  | 
And a JSON value of:
| 1 |  | 
ksqlDB can deserialize the values into the ID field of the stream.
When serializing data with a single field, ksqlDB can serialize the field
as an anonymous value if the WRAP_SINGLE_VALUE is set to false, for
example:
| 1 |  | 
Tip
Explicit wrapping and unwrapping is only supported for value columns. For more information, see Single field (un)wrapping.
Decimal Serialization¶
ksqlDB accepts decimals that are serialized either as numbers or the text representation of the base 10 equivalent. For example, ksqlDB can read JSON data from both formats below:
| 1 2 3 4 |  | 
Decimals with specified precision and scale are serialized as JSON numbers. For example:
| 1 2 3 |  | 
Timestamp Serialization¶
Timestamps are serialized as numbers indicating the Unix epoch time in milliseconds. For example,
a timestamp at 1970-01-01T00:00:00.001 is serialized as
| 1 2 3 |  | 
ksqlDb deserializes a number as a TIMESTAMP if it corresponds to a TIMESTAMP typed field in
the stream.
Time Serialization¶
Times are serialized as numbers indicating the number of milliseconds since the beginning of the day.
For example, 00:00:01 is serialized as
| 1 2 3 |  | 
ksqlDb deserializes a number as a TIME if it corresponds to a TIME typed field in
the stream.
Date Serialization¶
Dates are serialized as numbers indicating the number of days since the Unix epoch. For example,
a timestamp at 1970-01-03 is serialized as
| 1 2 3 |  | 
ksqlDb deserializes a number as a DATE if it corresponds to a DATE typed field in
the stream.
Bytes Serialization¶
Bytes are serialized as a Base64-encoded string value. For example, the byte
array [61, 62, 63] is serialized as:
| 1 2 3 |  | 
ksqlDb deserializes a string as BYTES if it corresponds to a BYTES typed field in
the stream.
Field Name Case Sensitivity¶
The format is case-insensitive when matching a SQL field name with a JSON document's property name. The first case-insensitive match is used.
Avro¶
| Feature | Supported | 
|---|---|
| As value format | Yes | 
| As key format | Yes | 
| Multi-Column Keys | Yes | 
| Schema Registry required | Yes | 
| Schema inference | Yes | 
| Single field wrapping | Yes | 
| Single field unwrapping | Yes | 
The AVRO format supports Avro binary serialization of all SQL
data types, including records and
top-level primitives, arrays, and maps.
Note
ksqlDB doesn't support creating streams or tables from a topic that has a recursive Avro schema.
The format requires ksqlDB to be configured to store and retrieve the Avro schemas from the Confluent Schema Registry. For more information, see Configure ksqlDB for Avro, Protobuf, and JSON schemas.
Avro Records¶
Avro records can be deserialized into matching ksqlDB schemas.
For example, given a SQL statement such as:
| 1 |  | 
And an Avro record serialized with the schema:
| 1 2 3 4 5 6 7 8 9 10 11 |  | 
ksqlDB deserializes the Avro record's fields into the corresponding fields of the stream.
Important
By default, ksqlDB-registered schemas have the same name
  (KsqlDataSourceSchema) and the same namespace
  (io.confluent.ksql.avro_schemas). You can override this behavior by
  providing a VALUE_AVRO_SCHEMA_FULL_NAME property in the WITH clause,
  where you set the VALUE_FORMAT to 'AVRO'. As the name suggests, this
  property overrides the default name/namespace with the provided one.
  For example, com.mycompany.MySchema registers a schema with the
  MySchema name and the com.mycompany namespace.
Top-level primitives, arrays and maps¶
The Avro format supports reading and writing top-level primitives, arrays and maps.
For example, given a SQL statement with only a single field in the
value schema and the WRAP_SINGLE_VALUE property set to false:
| 1 |  | 
And an Avro value serialized with the schema:
| 1 2 3 |  | 
ksqlDB can deserialize the values into the ID field of the stream.
When serializing data with a single field, ksqlDB can serialize the field
as an anonymous value if the WRAP_SINGLE_VALUE is set to false, for
example:
| 1 |  | 
Tip
Explicit wrapping and unwrapping is only supported for value columns. For more information, see Single field (un)wrapping.
Field Name Case Sensitivity¶
The format is case-insensitive when matching a SQL field name with an Avro record's field name. The first case-insensitive match is used.
KAFKA¶
| Feature | Supported | 
|---|---|
| As value format | Yes | 
| As key format | Yes | 
| Multi-Column Keys | No | 
| Schema Registry required | No | 
| Schema inference | No | 
| Single field wrapping | No | 
| Single field unwrapping | Yes | 
The KAFKA format supports INT, BIGINT, DOUBLE and STRING
primitives that have been serialized using Kafka's standard set of
serializers.
The format is designed primarily to support primitive message keys. It can be used as a value format, though certain operations aren't supported when this is the case.
Unlike some other formats, the KAFKA format does not perform any type
coercion, so it's important to correctly match the field type to the
underlying serialized form to avoid deserialization errors.
The table below details the SQL types the format supports, including
details of the associated Kafka Java Serializer, Deserializer and
Connect Converter classes you would need to use to write the key to
Kafka, read the key from Kafka, or use to configure Apache Connect to
work with the KAFKA format, respectively.
| SQL Field Type | Kafka Type | Kafka Serializer | Kafka Deserializer | Connect Converter | 
|---|---|---|---|---|
| INT / INTEGER | A 32-bit signed integer | org.apache.kafka.common.serialization.IntegerSerializer | org.apache.kafka.common.serialization.IntegerDeserializer | org.apache.kafka.connect.converters.IntegerConverter | 
| BIGINT | A 64-bit signed integer | org.apache.kafka.common.serialization.LongSerializer | org.apache.kafka.common.serialization.LongDeserializer | org.apache.kafka.connect.converters.LongConverter | 
| DOUBLE | A 64-bit floating point number | org.apache.kafka.common.serialization.DoubleSerializer | org.apache.kafka.common.serialization.DoubleDeserializer | org.apache.kafka.connect.converters.DoubleConverter | 
| STRING / VARCHAR | A UTF-8 encoded text string | org.apache.kafka.common.serialization.StringSerializer | org.apache.kafka.common.serialization.StringDeserializer | org.apache.kafka.connect.storage.StringConverter | 
Because the format supports only primitive types, you can only use it when the schema contains a single field.
For example, if your Kafka messages have a long key, you can make
them available to ksqlDB by using a statement like:
| 1 |  | 
If you integrate ksqlDB with Confluent Schema Registry, and your ksqlDB application uses a compatible value format (Avro, JSON_SR, or Protobuf), you can just supply the key column, and ksqlDB loads the value columns from Schema Registry:
| 1 |  | 
The key column must be supplied, because ksqlDB supports only keys in KAFKA
format.
Protobuf¶
| Feature | Supported | 
|---|---|
| As value format | Yes | 
| As key format | Yes | 
| Multi-Column Keys | Yes | 
| Schema Registry required | Yes | 
| Schema inference | Yes | 
| Single field wrapping | Yes | 
| Single field unwrapping | No | 
Protobuf handles null values differently than AVRO and JSON. Protobuf doesn't
have the concept of a null value, so the conversion between PROTOBUF and Java
(Kafka Connect) objects is undefined. Usually, Protobuf resolves a
"missing field" to the default value of its type.
- String: the default value is the empty string.
- Byte: the default value is empty bytes.
- Bool: the default value is false.
- Numeric type: the default value is zero.
- Enum: the default value is the first defined enum value, which must be zero.
- Message field: the field is not set. Its exact value is language-dependent. See the generated code guide for details.
Single field (un)wrapping¶
(de)serialization of single keys¶
ksqlDB assumes that any single key is unwrapped, which mean that it's not contained in an outer
record or object. Conversely, ksqlDB assumes that any key with multiple columns
(for example, CREATE STREAM x (K1 INT KEY, K2 INT KEY, C1 INT)) is wrapped, which means that it is a record
with each column as a field within the key. 
To declare a single-column key that's wrapped, specify a STRUCT type
with a single column. for example, K STRUCT<F1 INT> KEY. See the next two sections 
on single values for more information about wrapped and unwrapped data.
Controlling deserializing of single values¶
When ksqlDB deserializes a Kafka message into a row, the key is deserialized into the key field, and the message's value is deserialized into the value fields.
By default, ksqlDB expects any value with a single-field schema to have been serialized as a named field within a record. However, this is not always the case. ksqlDB also supports reading data that has been serialized as an anonymous value.
For example, a value with multiple fields might look like the following in JSON:
| 1 2 3 4 |  | 
If the value only had the id field, ksqlDB would still expect the value
to be serialized as a named field, for example:
| 1 2 3 |  | 
If your data contains only a single field, and that field is not wrapped
within a JSON object, or an Avro record is using the AVRO format, then
you can use the WRAP_SINGLE_VALUE property in the WITH clause of
your CREATE TABLE or
CREATE STREAM statements. Setting the
property to false tells ksqlDB that the value isn't wrapped, so the
example above would be a JSON number:
| 1 |  | 
For example, the following creates a table where the values in the underlying topic have been serialized as an anonymous JSON number:
| 1 2 3 4 5 6 7 |  | 
If a statement doesn't set the value wrapping explicitly, ksqlDB uses the
system default, which is defined by ksql.persistence.wrap.single.values.
You can change the system default, if the format supports it. For more information, see
ksql.persistence.wrap.single.values.
Important
ksqlDB treats null keys and values as a special case. We recommend
  avoiding unwrapped single-field schemas if the field can have a null
  value.
A null value in a table's topic is treated as a tombstone, which
indicates that a row has been removed. If a table's source topic has an
unwrapped single-field key schema and the value is null, it's treated
as a tombstone, resulting in any previous value for the key being
removed from the table.
A null key or value in a stream's topic is ignored when the stream is
part of a join. A null value in a table's topic is treated as a
tombstone, and a null key is ignored when the table is part of a join.
When you have an unwrapped single-field schema, ensure that any null
key or value has the desired result.
Controlling serialization of single values¶
When ksqlDB serializes a row into a Kafka message, the key field is serialized into the message's key, and any value fields are serialized into the message's value.
By default, if the value has only a single field, ksqlDB serializes the single field as a named field within a record. However, this doesn't always match the requirements of downstream consumers, so ksqlDB allows the value to be serialized as an anonymous value.
For example, consider the statements:
| 1 2 |  | 
The second statement defines a stream with only a single field in the
value, named f0.
By default, when ksqlDB writes out the result to Kafka, it persists the
single field as a named field within a JSON object, or an Avro record if
using the AVRO format:
| 1 2 3 |  | 
If you require the value to be serialized as an anonymous value, for example:
| 1 |  | 
Then you can use the WRAP_SINGLE_VALUE property in your statement.
For example,
| 1 |  | 
If a statement doesn't set the value wrapping explicitly, ksqlDB uses the
system default, defined by ksql.persistence.wrap.single.values, if the format supports it. 
You can change the system default. For more information, see
ksql.persistence.wrap.single.values.
Important
ksqlDB treats null keys and values as a special case. We recommended
  avoiding unwrapped single-field schemas if the field can have a null
  value.
A null value in a table's topic is treated as a tombstone, which
indicates that a row has been removed. If a table's source topic has an
unwrapped single-field key schema and the value is null, it's treated
as a tombstone, resulting in any previous value for the key being
removed from the table.
A null key or value in a stream's topic is ignored when the stream is
part of a join. A null value in a table's topic is treated as a
tombstone, and a null key is ignored when the table is part of a join.
When you have an unwrapped single-field schema, ensure that any null
key or value has the desired result.
Single-field serialization examples¶
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |  |