Generate test data
Use the ksql-datagen
command-line tool to generate test data that complies
with a custom schema that you define.
To generate test data, create an Apache Avro schema and pass it to
ksql-datagen
. This generates random data according to the schema you
provide.
Also, you can generate data from a few simple, predefined schemas.
Prerequisites:
- Confluent Platform is installed and running. This installation includes an Apache Kafka® broker, ksqlDB, Control Center, ZooKeeper, Schema Registry, REST Proxy, and Connect.
- If you installed Confluent Platform via TAR or ZIP, navigate to the installation directory. The paths and commands used throughout this tutorial assume that you're in this installation directory.
- Java: Minimum version 1.8. Install Oracle Java JRE or JDK >= 1.8 on your local machine.
The ksql-datagen
tool is installed with Confluent Platform by default.
Note
ksqlDB Server doesn't need to be running for ksql-datagen
to generate
records to a topic. The ksql-datagen
tool isn't just for ksqlDB. You
can use it to produce data to any Kafka topic that you have write access
to.
Usage¶
Use the following command to generate records from an Avro schema:
1 |
|
Required Arguments¶
Name | Default | Description |
---|---|---|
schema=<avro schema file> |
Path to an Avro schema file. Requires the format , topic , and key options. |
|
key-format=<key format> |
Kafka | Format of generated record keys: one of avro , json , delimited , kafka . Case-insensitive. |
value-format=<value format> |
JSON | Format of generated record values: one of avro , json , delimited . Case-insensitive. |
topic=<kafka topic name> |
Name of the topic that receives generated records | |
key=<name of key column> |
Field to use as the key for generated records. | |
quickstart=<quickstart preset> |
Generate records from a preset schema: orders , users , or pageviews . Case-insensitive. |
|
If topic isn't specified, creates a topic named <preset>_kafka_topic_json , for example, users_kafka_topic_json . |
Use the following command to generate records from one of the predefined schemas:
1 |
|
Optional Arguments¶
The following options apply to both the schema
and quickstart
options.
Name | Default | Description | |
---|---|---|---|
bootstrap-server=<kafka-server>:<port> |
localhost:9092 | IP address and port for the Kafka server to connect to. | |
key-format=<key format> |
Kafka | Format of generated record keys: avro , json , delimited or kafka . Case-insensitive. Required by the schema option. |
|
value-format=<value format> |
JSON | Format of generated record values: avro , json , or delimited . |
Case-insensitive. Required by the schema option. |
topic=<kafka topic name> |
Name of the topic that receives generated records. Required by the schema option. |
||
key=<name of key column> |
Field to use as the key for generated records. Required by the schema option. |
||
iterations=<number of records> |
1,000,000 | The maximum number of records to generate. | |
msgRate=<rate to produce in msgs/second> |
-1 (unlimited, i.e. as fast as possible) | The rate to produce messages at, in messages-per-second. | |
propertiesFile=<path-to-properties-file> |
<path-to-confluent>/etc/ksqldb/datagen.properties |
Path to the ksql-datagen properties file. |
|
schemaRegistryUrl |
http://localhost:8081 | URL of Schema Registry when format is avro . |
Tip
For usage information, enter ksql-datagen help
.
Generate Records From a Predefined Schema¶
The ksql-datagen
tool provides some simple schemas for generating
example orders, users, and pageviews data.
Generate Example Order Records With Structured Data¶
The orders
quickstart option produces records that simulate orders,
with orderid
, itemid
, price
, and location
columns. The location
column is
a STRUCT with city
, state
, and zipcode
columns. The orderId
column is stored
in the topic message key.
The following command generates example order records to a Kafka topic
named orders_topic
:
1 |
|
In the ksqlDB CLI or in Control Center, register a stream on
orders_topic
:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Inspect the schema of the orders_raw
stream by using the DESCRIBE
statement:
1 |
|
Your output should resemble:
1 2 3 4 5 6 7 8 9 |
|
For more information, see How to query structured data.
Generate Example User Records¶
The users
quickstart option produces records that simulate user data,
with registertime
, gender
, regionid
, and userid
fields. You can
join userid
values with the page view records generated by the
pageviews
quickstart option.
The following command generates example user records:
1 |
|
In this example, no topic name is specified, so ksql-datagen
creates a
topic named users_kafka_topic_json
.
In the ksqlDB CLI or in Control Center, register a table on
users_kafka_topic_json
:
1 2 3 4 5 6 7 8 9 |
|
Inspect the schema of the users_original
table by using the DESCRIBE
statement:
1 |
|
Your output should resemble:
1 2 3 4 5 6 7 8 |
|
Generate Example User Records With Complex Data¶
The users_
quickstart option produces records that simulate user data,
with registertime
, gender
, regionid
, userid
, interests
, and
contactInfo
columns. The interests
column is an ARRAY, and the
contactInfo
column is a MAP.
You can join userid
values with the page view records generated by the
pageviews
quickstart option.
The following command generates example user records that have complex data:
1 |
|
In the ksqlDB CLI or in Control Center, register a table on
users_extended
:
1 2 3 4 5 6 7 8 9 10 11 |
|
Inspect the schema of the users_extended
table by using the DESCRIBE
statement:
1 |
|
Your output should resemble:
1 2 3 4 5 6 7 8 9 10 |
|
For more information, see How to query structured data.
Generate Example User Page Views¶
The pageviews
quickstart option produces records that simulate page
views, with viewtime
, userid
, and pageid
columns. You can join
userid
values with the user records generated by the users
quickstart option.
The following command generates example pageview records to a Kafka
topic named pageviews
:
1 |
|
In the ksqlDB CLI or in Control Center, register a stream on
pageviews
:
1 2 3 4 5 6 7 8 |
|
Inspect the schema of the pageviews_original
stream by using the
DESCRIBE statement:
1 |
|
Your output should resemble:
1 2 3 4 5 6 7 |
|
Generate Records From an Avro Schema¶
Define a Custom Schema¶
In this example, you download a custom Avro schema and generate matching test data. The schema is named impressions.avro, and it represents advertisements delivered to users.
Download impressions.avro
and copy it to your home directory. It's
used by ksql-datagen
when you start generating test data.
1 |
|
Generate Test Data¶
When you have a custom schema registered, you can generate test data
that's made up of random values that satisfy the schema requirements.
In the impressions
schema, advertisement identifiers are two-digit
random numbers between 10 and 99, as specified by the regular expression
ad_[1-9][0-9]
.
Open a new command shell, and in the <path-to-confluent>/bin
directory, start generating test values by using the ksql-datagen
command. In this example, the schema file, impressions.avro
, is in the
root directory.
1 |
|
After a few startup messages, your output should resemble:
1 2 3 4 |
|
Consume the Test Data Stream¶
In the ksqlDB CLI or in Control Center, register the impressions
stream:
1 |
|
Create the impressions2
persistent streaming query:
1 |
|