Synthetic key columns
Some joins have a synthetic key column in their result. This is a column that does not come from any source. Here's an example to help explain what synthetic key columns are and why they are required:
1 2 |
|
The previous statement seems straightforward enough: create a new table that's the result of
performing a full outer join of two source tables, joining on their ID columns. You might expect
that such a join would result in a table with a compound primary key containing both L.ID
and
R.ID
, and that's what will happen once ksqlDB supports
structured keys.
Unfortunately, until ksqlDB supports structured keys, the join must result in a table with a single
primary key column. Because the previous is a full-outer join, either L.ID
or R.ID
may be
missing (NULL
), or both may have the same value. But, the data produced to Apache Kafka®
always has the message key set to the non-null ID
column, as shown in the following table:
L.ID | R.ID | Kafka message key |
---|---|---|
10 | null | 10 |
null | 7 | 7 |
8 | 8 | 8 |
Clearly, the data stored in the Kafka message's key does not match either of the source ID
columns. Instead, it's a new column: a synthetic column, which means a column that doesn't belong
to either source table.
What joins result in synthetic key columns?¶
Any join where the key column in the result does not match any source column is said to have a synthetic key column.
The following types of joins result in a synthetic key column being added to the result schema:
-
FULL OUTER
joins, for example:sql CREATE TABLE OUTPUT AS SELECT * FROM L FULL OUTER JOIN R ON L.ID = R.ID;
-
Any join where all expressions used in the join
ON
criteria are not simple column references. For example:sql -- join on expressions other than column references: CREATE TABLE OUTPUT AS SELECT * FROM L JOIN R ON ABS(L.ID) = ABS(R.ID);
What name is assigned to a Synthetic key column?¶
The default name of a synthetic key column is ROWKEY
. But, if any sources used in the join
already contain a column named ROWKEY
, the synthetic key column is named ROWKEY_1
, or
ROWKEY_2
if there exists a source column called ROWKEY_1
, etc. For example:
1 2 3 4 5 6 7 8 |
|
Like any other key column, the synthetic key column must be included in the projection of streaming queries. If you projection is missing the synthetic key, then an error like the one below will be returned, indicating the name of the missing key column:
1 2 3 |
|
Optionally, you may provide an alias for the key column in the projection. This is recommended, as system generated names are not guaranteed to remain consistent between versions. For example:
1 2 |
|
Will there always be synthetic keys?¶
No, as stated above, synthetic key columns will no longer be required once ksqlDB supports structured keys.