Relationship to Kafka Streams
ksqlDB is the streaming database for Apache Kafka®. With ksqlDB, you can write event streaming applications by using a lightweight SQL syntax.
Kafka Streams is the Kafka library for writing streaming applications and microservices in Java and Scala.
ksqlDB is built on Kafka Streams, a robust stream processing framework that is part of Kafka.
ksqlDB gives you a query layer for building event streaming applications on Kafka topics. ksqlDB abstracts away much of the complex programming that's required for real-time operations on streams of data, so that one line of SQL can do the work of a dozen lines of Java or Scala.
For example, to implement simple fraud-detection logic on a Kafka topic
named payments
, you could write one line of SQL:
1 2 3 4 |
|
The equivalent Scala code on Kafka Streams might resemble:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
ksqlDB is easier to use, and Kafka Streams is more flexible. Which technology you choose for your real-time streaming applications depends on a number of considerations. Keep in mind that you can use both ksqlDB and Kafka Streams together in your implementations.
Differences Between ksqlDB and Kafka Streams¶
The following table summarizes some of the differences between ksqlDB and Kafka Streams.
Differences | ksqlDB | Kafka Streams |
---|---|---|
You write: | SQL statements | JVM applications |
Graphical UI | Yes, in Confluent Control Center and Confluent Cloud | No |
Console | Yes | No |
Data formats | Avro, Protobuf, JSON, JSON_SR, CSV | Any data format, including Avro, JSON, CSV, Protobuf, XML |
REST API included | Yes | No, but you can implement your own |
Runtime included | Yes, the ksqlDB server | Applications run as standard JVM processes |
Queryable state | No | Yes |
Developer Workflows¶
There are different workflows for ksqlDB and Kafka Streams when you develop streaming applications.
- ksqlDB: You write SQL queries interactively and view the results in real-time, either in the ksqlDB CLI or in Confluent Control Center.
- Kafka Streams: You write code in Java or Scala, recompile, and run and test the application in an IDE, like IntelliJ. You deploy the application to production as a jar file that runs in a Kafka cluster.
ksqlDB and Kafka Streams: Where to Start?¶
Use the following table to help you decide between ksqlDB and Kafka Streams as a starting point for your real-time streaming application development.
Start with ksqlDB when... | Start with Kafka Streams when... |
---|---|
New to streaming and Kafka | Prefer writing and deploying JVM applications like Java and Scala; for example, due to people skills, tech environment |
To quicken and broaden the adoption and value of Kafka in your organization | Use case is not naturally expressible through SQL, for example, finite state machines |
Prefer an interactive experience with UI and CLI | Building microservices |
Prefer SQL to writing code in Java or Scala | Must integrate with external services, or use 3rd-party libraries (but ksqlDB user defined functions(UDFs) may help) |
Use cases include enriching data; joining data sources; filtering, transforming, and masking data; identifying anomalous events | To customize or fine-tune a use case, for example, with the Kafka Streams Processor API: custom join variants, or probabilistic counting at very large scale with Count-Min Sketch |
Use case is naturally expressible by using SQL, with optional help from UDFs | Need queryable state, which ksqlDB doesn't support |
Want the power of Kafka Streams but you aren't on the JVM: use the ksqlDB REST API from Python, Go, C#, JavaScript, shell |
Usually, ksqlDB isn't a good fit for BI reports, ad-hoc querying, or queries with random access patterns, because it's a continuous query system on data streams.
To get started with ksqlDB, try the Tutorials and Examples.
To get started with Kafka Streams, try the Streams Quick Start.