Saturday, January 16, 2016

Messages delivery pipeline usign Apache Kafka, Solr and Velocity

Messaging delivery is one of the core functionality in an enterprise IT infrastructure. The message can be any format either in json or xml format or binary data. Its content can be email, file and any network request and etc.

There are many messaging frameworks like Websphere MQ, RabbitMQ and Kafka. One of the benefits of Kafka is the messaging persistence for configurable period due to its disk-based message storage. See Kafka website for nice introduction about Kafka.

Recently I work on a high-throughput message delivery pipeline which allows multiple consumers listen to various Kafka topic streams, and filters message based on configurable rules and delivers the messages to various endpoints such as file system, email, web services and etc.

The following diagram illustrates the high-level components in the pipeline. Its core job consists of MessageLisenter: a Kafka consumer listen to a topic, RuleEvaluator: process the polled data stream against velocity template based rules and MessageDeliver: deliver the message to one endpoint. If we define subscription as a kafka topic, rules and an endpoint. Then pipeline is a thread pool which contains many subscription-based process jobs or threads.

A Spring task scheduler is used to pull all of subscriptions from an object store and create thread if necessary. 





The pipeline is a state machine goes through various states such as message_acquired, message_rule_matched, message_delivered and etc. We also use Apache Solr to index the job related data and its status. This allows us to build a job monitoring UI to query Solr and display job status to users.

Another two important features are the message delivery retry and replay capability. Basically when message goes through the steps in the pipeline, it can fail at any points such as failing to deliver or rule engine system failure. In failing to deliver case we will retry in an exponentially increased interval interval until giving up at some point. Then once developer fixes any issue associated with the failure, he/she can send relay message and pipeline can replay it.

Circuit breakers are also implemented to allow us throttle the various network-intensive requests if there are any system failure.  See Martin Fowler's blog for introduction to circuit breaker.

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete