This app provides an API, for the clients to query real-time price statistics of financial instruments received within the last 60 seconds (tumbling time interval), using Kafka Streams and Spring Cloud Streams Framework.
The reason I wanted to provide a Kafka Streams based solution is obvious (besides personal curiosity). That's because, at a larger scale the characteristics of the problem calls for using such a scalable streaming framework. ( Apache Flink would've also been a great candidate) .
Unfortunately the Kafka SlidingWindow feature was not officially coupled with the version of the Spring Cloud Stream framework that came with the starter parent ( 2.3.4.RELEASE) .
According to this here ( apache/kafka#9039 ), the SlidingWindow feature has been accepted into the codebase .
And this one here (https://www.confluent.io/blog/apache-kafka-2-7-features-updates-improvements/) mentions that it has been included in release 2.7 of Apache Kafka .
But it took me a while to find out the latest version of Spring Cloud Stream hasn't packaged it yet, That wasn't clearly reflected in the official documentation . Perhaps a little bit of more time was needed tweaking it
I had to contend with TumblingWindows, instead of SlidingWindows as I would've liked to.
There are three API endpoints:
• POST /ticks : This one is called by the clients to register a Tick. It is also the sole input of this rest API
• GET /statistics : This one returns the statistics based on the ticks of all instruments of the last 60 seconds (tumbling time interval)
• GET /statistics/{instrument_id} : This one returns the statistics based on the ticks of the given instrument of the last 60 seconds (tumbling time interval)
The application has been configured with a default value of 60 seconds for the sliding time interval. In order to change it to another value, please update the constant value TUMBLING_WINDOWS_SECS defined in KafkaStreamsTickerApp.java
I assumed ;
- The /ticks endpoint can get called several times a second.
- The application will not be running behind a load balancer. This implementation is a single node application, that can't know what statistics another instance has accumulated.
The application could possible be improved upon under following topics:
- Making it ready to work behind a load balancer, by using a distributed key-value store instead of the local map data structures.
- Giving the ability to scale even more, by providing a sharding mechanism at the core, to distribute the load based on a certain key.
You need Maven, Git and Java SE Runtime Environment 9 (minimum), docker, docker-compose to run this project.
https://www.oracle.com/java/technologies/javase/javase9-archive-downloads.html
https://maven.apache.org/install.html
https://git-scm.com
https://docker.com
Clone repository to your local machine
git clone git@github.com:cnkuyan/indexer-kafka.git
Go to project folder
cd indexer-kafka
Bring up the kafka and zookeper in your local environment
docker-compose up -d
Run maven to create jar ( Without the last parameter to run the tests at this point)
mvn clean install package -DskipTests
java -jar target/kafka-streams-ticker-0.0.1-SNAPSHOT.jar
The following will be carried using the curl
tool.
curl http://localhost:8080/ticks --header "Content-Type: application/json" --request POST --data '{ "instrument": "IBM", "timestamp": 1617473494140, "price": 130.1 }'
If the Tick's timestamp is older than 1 second, it will be rejected by the endpoint with Http NoContent
response.
curl http://localhost:8080/statistics
curl http://localhost:8080/statistics/{instrument_id}
- SpringBoot - The web framework used for creating APIs
- Spring Cloud Stream - The cloud stream framework that provides the binder for Kafka and others
- Maven - Dependency Management
- Cenk Uyan - github/cnkuyan