Saturday, January 16, 2016

Messages delivery pipeline usign Apache Kafka, Solr and Velocity

Messaging delivery is one of the core functionality in an enterprise IT infrastructure. The message can be any format either in json or xml format or binary data. Its content can be email, file and any network request and etc.

There are many messaging frameworks like Websphere MQ, RabbitMQ and Kafka. One of the benefits of Kafka is the messaging persistence for configurable period due to its disk-based message storage. See Kafka website for nice introduction about Kafka.

Recently I work on a high-throughput message delivery pipeline which allows multiple consumers listen to various Kafka topic streams, and filters message based on configurable rules and delivers the messages to various endpoints such as file system, email, web services and etc.

The following diagram illustrates the high-level components in the pipeline. Its core job consists of MessageLisenter: a Kafka consumer listen to a topic, RuleEvaluator: process the polled data stream against velocity template based rules and MessageDeliver: deliver the message to one endpoint. If we define subscription as a kafka topic, rules and an endpoint. Then pipeline is a thread pool which contains many subscription-based process jobs or threads.

A Spring task scheduler is used to pull all of subscriptions from an object store and create thread if necessary. 





The pipeline is a state machine goes through various states such as message_acquired, message_rule_matched, message_delivered and etc. We also use Apache Solr to index the job related data and its status. This allows us to build a job monitoring UI to query Solr and display job status to users.

Another two important features are the message delivery retry and replay capability. Basically when message goes through the steps in the pipeline, it can fail at any points such as failing to deliver or rule engine system failure. In failing to deliver case we will retry in an exponentially increased interval interval until giving up at some point. Then once developer fixes any issue associated with the failure, he/she can send relay message and pipeline can replay it.

Circuit breakers are also implemented to allow us throttle the various network-intensive requests if there are any system failure.  See Martin Fowler's blog for introduction to circuit breaker.

Sunday, January 10, 2016

Largest Number - Java8 Stream - LeetCode

This problem is from LeetCode:

Given a list of non negative integers, arrange them such that they form the largest number. For example, given [3, 30, 34, 5, 9], the largest formed number is 9534330. Note: The result may be very large, so you need to return a string instead of an integer.

The first simple solution is to treat each integer as string, sort the strings, then going through the strings in reversed lexicographical order and concatenate them all.

This approach works for some cases like: 99, 87, 34. Result is 998734.
However it doesn't work for [3, 30, 34, 5, 9], since it will generate 9534303 in stead of 9534330. So we need to make sure when we sort the array in a special way that  "3" is larger than numbers like "30", "31", "32". Why? Because 330 > 303, 331>313...

How can we archive that? Actually the above comparison results already give us the hint! We can build a comparator, and concatenate the string in different ways and compared them.
Here is the code:


If we use Java8 stream API, without worrying about the all 0s case, this solution becomes a one-liner!!!