Synthetic key columns
Some joins have a synthetic key column in their result. This is a column that does not come from any source. Here's an example to help explain what synthetic key columns are and why they are required:
1 2 |
|
The previous statement seems straightforward enough: create a new table that's the result of
performing a full outer join of two source tables, joining on their ID columns. But in a
full-outer join, either L.ID
or R.ID
may be missing (NULL
), or both
may have the same value. Since the data produced to Apache Kafka® should always have a non-null
record key, ksql selects the first non-null key to use:
L.ID | R.ID | Kafka record key |
---|---|---|
10 | null | 10 |
null | 7 | 7 |
8 | 8 | 8 |
The data stored in the Kafka record's key may not match either of the source ID
columns. Instead, it's a new column: a synthetic column, which means a column that doesn't belong
to either source table.
Which joins result in synthetic key columns?¶
Any join where the key column in the result does not match any source column is said to have a synthetic key column.
The following types of joins result in a synthetic key column being added to the result schema:
-
FULL OUTER
joins, for example:sql CREATE TABLE OUTPUT AS SELECT * FROM L FULL OUTER JOIN R ON L.ID = R.ID;
-
Any join where all expressions used in the join
ON
criteria are not simple column references. For example:sql -- join on expressions other than column references: CREATE TABLE OUTPUT AS SELECT * FROM L JOIN R ON ABS(L.ID) = ABS(R.ID);
What name is assigned to a Synthetic key column?¶
The default name of a synthetic key column is ROWKEY
. But, if any sources used in the join
already contain a column named ROWKEY
, the synthetic key column is named ROWKEY_1
, or
ROWKEY_2
if there exists a source column called ROWKEY_1
, etc. For example:
1 2 3 4 5 6 7 8 9 10 11 |
|
Like any other key column, the synthetic key column must be included in the projection of streaming queries. If your projection is missing the synthetic key, then an error like the one below will be returned, indicating the name of the missing key column:
1 2 3 |
|
Optionally, you may provide an alias for the key column in the projection. This is recommended, as system generated names are not guaranteed to remain consistent between versions. For example:
1 2 |
|