In the last decade, Kafka has become the de-facto event streaming solution. It has gained popularity as thousands of products are using it to solve simple to complex event distribution and streaming problems. As a software architect, I have also used Kafka in various solutions. During this process, I realized that common patterns evolved over the period like event sinking or sourcing from or to Kafka, schema sharing, monitoring (Kafka cluster and other related tools), etc.
These requirements are very common, and various tools are available to meet them. In this blog, I am documenting our choice of tools for these common requirements.
The below diagram shows the five essential requirements which we had in our solution -
- Data Metadata sharing (Schema)
- Data Sink & Source pattern implementation
- Data Access using REST
- Data Replication
Data Sink & Source: Kafka-Connector
In most solutions, data needs to move from Kafka to some storage like a Database, Disk, JMS, or vice-versa. Using the sink and source architecture, Kafka-Connect offers a declarative, pluggable data integration framework
for Kafka. It links data sources and sinks to Kafka, enabling the team to focus on actual business logic.
Apache Kafka Connect lets teams use readily available connectors or develop them as per the standard framework to process the data. Kafka Connect allows developers to run it in distributed or standalone mode and scale as needed. And in addition, it also supports offset handling and a REST interface for management. There are readily available plugins to help the various source and sinks like a database (PostgreSQL, MySQL, Oracle, DynamoDB), S3, messaging solutions (JMS compliant brokers),
Data Schema: Schema Registry
The next common requirement is sharing the data structure between producers and consumers. In addition, the solution should allow teams to evolve the schema and propagate the same effortlessly without impacting the existing implementation. Here Kafka Registry helps immensely. Kafka Registry works as a bridge between the producers and consumers for sharing the message’s schema. Confluent Schema Registry supports schema for Apache Avro, JSON, and Protobuf.
Data Replication Between Kafka: Kafka-Mirror
In some cases, it is required to replicate the data between the two Kafka clusters, which may be due to the following reasons -
- From the availability point of view, like syncing Kafka clusters between master and DR site geo-replication.
- Sync data between production and stage in exceptional cases for application testing where actual data is required
- Pushing data from the edge cluster to the central cluster
- Solutions where two or more Kafkas might be used, for example, internal Kafka for internal eventing and external Kafka for outside access
- Hybrid cloud deployments
We have used Kafka MirroMaker for one of such scenarios to replicate data between the Kafka Clusters. The below diagram explains the basic working of the Kafka MirrorMaker.
This tool supports features such as:
- Replicates topics (data plus configurations)
- Replicates consumer groups, including offsets to migrate applications between clusters
- Replicates ACLs
- Preserves partitioning
- Automatically detects new topics and partitions
- Provides a wide range of metrics, such as end-to-end replication latency across multiple data centers/clusters
- Fault-tolerant and horizontally scalable operations
Data Access Over Rest: Kafka Rest-Proxy
In general, Kafka implementation uses the streaming, producer, and consumer APIs, but in exceptional cases, data needs to make available over the REST interface. Apache Kafka Rest is again an interesting tool to use with Kafka as it can be used to expose topics for reading & writing using the REST interface.
Kafka Cluster Observability: AkHQ
Last but not least, Kafka ecosystem monitoring (Kafka Cluster, Kafka Registry, Kafka Connect). For this, we have evaluated a few tools out of
which akHQ stood out clearly. We used akHQ for managing & viewing data inside the Apache Kafka cluster and monitoring and managing the Kafka Connect and Registry.
akHQ provides a GUI for managing the topics, managing topics data, consumers group, schema registry, connect, etc.
Find below some of the useful akHQ features -
- Supports multiple Kafka cluster configuration
- Monitor Kafka broker nodes
- Topic overview (including consumers, lags, partitions and replications details)
- Live Tail, It allows the user to search for text (similar to the tail command) in the topic
- Various kinds of authentications are possible. More details are provided here
- It provides UI interface for schema registry to create, view, update and delete subject
- Monitor the Kafka connect jobs, including creating a new definition
- Most features support search functionality like searching topics by the name, search consumers, etc.
- UI access can be secured using the auth feature
- The Docker image is available
- Helm chart available for Kubernetes deployment
- Lightweight, uses very minimal resources
As Kafka ecosystem is changing rapidly, it would be interesting to see how supporting tools and formworks evolve in the future. With this, I thank you for your time and looking forward to the feedback.