How to query structured data¶
Context¶
You have events that contain structured data types like structs, maps, and arrays. You want to write them to ksqlDB and read their inner contents with queries. Because ksqlDB represents each event as a row with a flat series of columns, you need a bit of syntax to work with these data types. This is sometimes called "destructuring".
In action¶
1 2 3 4 5 |
|
Data types¶
Structs¶
Structs are an associative data type that map VARCHAR
keys to values of any type. Destructure structs by using arrow syntax (->
).
Begin by telling ksqlDB to start all queries from the earliest point in each topic.
1 |
|
Make a stream s2
with two columns: a
and b
. b
is a struct with VARCHAR
keys c
and d
, whose value data types are VARCHAR
and INT
respectively.
1 2 3 4 5 6 7 8 9 10 11 |
|
Insert some rows into s2
. You can represent a struct literal by using the STRUCT
constructor, which takes a variable number of key/value arguments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
To access a struct in a query, start with the name of a column and add ->
each time you want to drill into a key. This query selects column a
, b
, the key c
within b
, and the key d
within b
:
1 2 3 4 5 6 |
|
Your output should resemble the following results. Notice that the column names for the last two columns are C
and D
respectively. By default, ksqlDB will give the column the name of the last identifier in the arrow chain. You can override this by aliasing, such as b->c AS x
. If you drill into nested values that finish with the same identifier name, ksqlDB will force you to provide an alias to avoid ambiguity.
1 2 3 4 5 6 |
|
Maps¶
Maps are an associative data type that map keys of any type to values of any type. The types across all keys must be the same. The same rule holds for values. Destructure maps using bracket syntax ([]
).
Begin by telling ksqlDB to start all queries from the earliest point in each topic.
1 |
|
Make a stream s3
with two columns: a
and b
. b
is a map with VARCHAR
keys and INT
values.
1 2 3 4 5 6 7 8 |
|
Insert some rows into s3
. You can represent a MAP literal by using the MAP
constructor, which takes a variable number of key/value arguments. c
and d
are used consistently in this example, but the key names can be heterogeneous in practice.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
To access a map in a query, start with the name of a column and add []
each time you want to drill into a key. This query selects column a
, b
, the key c
within b
, and the key d
within b
:
1 2 3 4 5 6 |
|
This query should return the following results. The last two column names have been aliased. If you elect not to give them a name, ksqlDB will generate names like KSQL_COL_0
for each.
1 2 3 4 5 6 |
|
Arrays¶
Arrays are a collection data type that contain a sequence of values of a single type. Destructure arrays using bracket syntax ([]
).
Begin by telling ksqlDB to start all queries from the earliest point in each topic.
1 |
|
Make a stream s4
with two columns: a
and b
. b
is an array with INT
elements.
1 2 3 4 5 6 7 8 |
|
Insert some rows into s4
. You can represent an array literal by using the ARRAY
constructor, which takes a variable number of elements.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
To access an array in a query, start with the name of a column and add []
each index you want to drill into. This query selects column a
, b
, the first element of b
, the second element of b
, the third element of b
, and the last element of b
:
1 2 3 4 5 6 7 8 |
|
This query should return the following results. Notice that index 1
represents the first element of each array. By contrast to many programming languages which represent the first element of an array as 0
, most databases, like ksqlDB, represent it as 1
. If an element is absent, the result is null
. You can use negative indices to navigate backwards through the array. In this example, -1
retrieves the last element of each array regardless of its length.
1 2 3 4 5 6 |
|
Deeply nested data¶
You may have structured data types that are nested within one another. Each data type's destructuring syntax composes irrespective of how it is nested.
Begin by telling ksqlDB to start all queries from the earliest point in each topic.
1 |
|
Make a stream s4
with two columns: a
and b
. Here is how b
breaks down:
b
is a struct withVARCHAR
keysc
andd
.c
is an array ofINT
elements.d
is a map ofVARCHAR
keys and struct values.- That struct has keys
e
andf
, with values of typeVARCHAR
andBOOLEAN
respectively.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Insert some rows into s4
. Notice how the constructors for each data type readily compose.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
To access nested values, use the destructuring syntax from each data type. Notice how you can chain them together:
1 2 3 4 5 6 7 |
|
This query should return the following results. The rules for how each column name is generated are based on the data type that is at the tail of each selected element.
1 2 3 4 5 6 7 8 9 |
|