In looking at each of these features, we’ll discuss use cases that the library applies to and dive into the major technical themes of its implementation. These are great for implementing low-latency task queues, a problem that isn’t well addressed by Apache Kafka today. For instance, if you need to look up customer details from a database or while you are processing messages, you can make these requests in parallel via non-blocking I/O.įinally, the Parallel Consumer provides features for client-side work queues, including message-level acknowledgment and key-based processing. Second, the Parallel Consumer makes it easy for you to call out to other services efficiently without stalling your application. By switching from partition-level parallelism to key-level parallelism, you don’t have to over-provision topic partitions or change the ones you have just so you can scale your consumer group out. It does this using a thread pool, with the library handling all the tricky bookkeeping required in Kafka. The Parallel Consumer also lets you define parallelism in terms of key-level ordering guarantees, rather than the coarser-grained, partition-level parallelism that comes with the Kafka consumer groups. In essence, the Parallel Consumer is a JVM-based, Apache 2.0 client library that includes everything you’d expect in regular Kafka consumers: consumer groups, transactions/exactly-once semantics, etc., but also three new features in addition to these.įirst, the Parallel Consumer makes it easy to process messages with a higher level of parallelism than the number of partitions for the input data. These are just a few of the reasons why we wrote the Confluent Parallel Consumer, which provides an alternate approach to parallelism that subdivides the unit of work from a partition down to a key or even a message. For example, when partition counts are fixed for a reason beyond your control, you need to call other databases or microservices-which can take a while to respond-or use queue-like semantics, where slow-to-process messages don’t hold up faster ones further back in the queue.
Consuming messages in parallel is what Apache Kafka ® is all about, so you may well wonder, why would we want anything else? It turns out that, in practice, there are a number of situations where Kafka’s partition-level parallelism gets in the way of optimal design.