Fast Data

We have received many questions in the last months about why we have used Apache Spark (and Databricks) to implement the API endpoints in the Fexco Central API. Why not use JHipster to make “services” coupled with the consumer? Well… let me explain. The answer is called Fast Data.

It might appear like a weird decision for regular developers (not for technical architects) considering that in the most cases APIs are made by using simple REST-like architectures (not always RESTful) based exclusively on HTTP1 with macro-services (what I see around is a wrong tendency to make bigger and bigger apps breaking the single-responsibility principle) directly exposing HTTP web services. These services usually access directly to the data storage.

The limitations of this approach were obvious to us. Let me enumerate the reasons:

  • HTTP-only APIs are not a solution nowadays. We have to cover all possible cases with several types of consumers that implement asynchronous operations. Besides, server-push is needed and it has to be provided in a uniform, controlled and efficient way (e.g. using AMQP).
  • Event Sourcing is difficult to implement in an architecture like that. The macro-service grows and grows and the it turns inevitably into a Single Point of Failure. Event Sourcing IS distributed by nature.
  • The only direction to scale is upwards and High Availability with more instances is your only solution to solve concurrency issues.
  • You are not ready for Data Analysis, Machine Learning and nothing related to Big Data. To get these capabilities you should make brand new components with your macro-services as clients.
API consumers, API consumers, API consumers…

So, the solution came from a paradigm that is called Fast Data that was created by the Lightbend guys as a product. However, we are talking here about the architectural approach and strategy rather than about technical details. But, what is exactly Fast Data?

Fast Data is the intersection area in the middle! Of course it was not necessary to say…

The idea is to build an architecture with these requirements:

  • Management of structured, non-structured and multi-structured data.
  • Real-time analysis of data
  • In-memory or ultra low latency reading model storage
  • Replay of events automatically provided in the shape of the reading model
  • Distributed event-oriented writing storage
  • Event Storage for audit purposes
  • Stream processing of events
  • Capability to process Big Data operations in a seamless way
  • Capability of processing and analyze very large data sets
  • Massive parallel processing out of the box (no special development is needed)
Oh no! That thing about reactive programming again!

Basically this approach has certain drawbacks in the beginning:

  • The technical team is not used to these tools the learning curve is costly and it takes time.
  • Ad-hoc SDLC has to be prepared. Same for testing and quality.
  • Many services and tools are needed (stream communications, messaging, transient storage, CQRS, etc). Really overwhelming if your team is not very skilled.
  • It’s more expensive in terms of infrastructure and communications. You have to analyze carefully your providers, tools and services available.

On the other side the advantages (in time) are huge!

  • Data Analysis is done seamlessly
  • Endpoints scale fast as parallel programming and relying clusters scale automatically.
  • Reactive Functional Programming is much more efficient in terms of resources consumption and therefore cheaper in terms of infrastructure.
  • You are able to serve large amounts of data to much more concurrent users you could imagine.

Conclusion

Definitely, the Fast Data Approach has allowed us to apply Machine Learning models to stored business data. Real-time behavior has improved the UX in API consumer applications making much easier the management of different and smarter strategies (e.g. offline distributed remote shops) for businesses. Real-time fraud detection and risk management, Anti Money Laundry compliance, security audit, data governance and data visualizations have improved a lot with the Fast Data approach.

Fast Data is definitely something we recommend for high performance and versatile new APIs going beyond simple REST-HTTP-only limited APIs. We recommend you to read our posts in this blog regarding the different aspects of this approach.

You can have a look on Lightbend Fast Data Platform

This really interesting conference by Dean Wampler.

..and eventually you can get his small book (50 pages aprox.) from here.

Happy journey to Fast Data!

Jesus de Diego

Author: Jesus de Diego

Software Architect and Team Lead

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.