Migrating Relational DB to Cassandra Modeling (How-to)

One challenge often arises when breaking down monolithic applications into microservices: reworking the data layer. One of the common tasks is to migrate the relational database into a more scalable option using NoSQL databases. Apache Cassandra is a popular choice, but there are a number of challenges when migrating the model from SQL database to it.

Is it possible? Yes! So let’s go through the process of migrating Relational Databases into NoSQL store. Maybe you will learn something in the way…

What about the other Challenges?

– Have enough understanding of the old application;
– Simplify the old model to mitigate possible addictions and bad design;
– Trying to think about all possibilities to make it right the first time(you probably won’t!).

Before start…

In order to start, two principles need to be clear:

Query first: Before start modelling tables and columns, one item that is more important is to understand how the data is retrieved. This will make the performance optimization much easier.

Denormalization: In the normal relational modelling you create tables and relate them to foreign keys. This new approach you create tables to support queries and those tables don’t have relationships (no Joins allowed).

Let’s do it piece by piece (like Jack the Ripper)

The entire process could be split into 2 phases. The first phase is a closer look at the monolithic application, in order to identify the application flow and strong entities. This is a crucial phase to understand the flow of the data and to identify what is a key point or what is the result of Bad Design. This will also give you a bigger picture of what needs to be migrated.

There are many ways of mapping old application queries. One suggestion is to run the existing application with DebugMode “ON” and try to identify action points, like Buttons and Screen Initialization(Menu or link), and track those actions into executions(Selects, Inserts, Updates, Deletes) on SQL database. For each action point that results in a Database execution, you should create a record in a separated document (ex. Spreadsheet). This document should contain as much as possible information about the function that you mapped, with description and queries that are related to that function(endpoint, action, description, service name, related entities, data queries).

The second phase uses the output of phase 1 to create the first draft of Application Queries. Those queries will drive the creation of keyspaces, tables and columns in the new model.  The suggested approach uses the Methodology presented by Artem Chebotko from DataStack Inc. You can check the complete paper about it here.

…in conclusion.

Moving to Cassandra is a complex job, but if you follow this procedure, the migration is a feasible task and easier than you think. One benefit of going through this path is a more scalable solution(at least we hope so!). Just before finishing if I could give some advice about it:

Must do:

  • Tables need to be created considering mainly the queries and the columns that need to be filtered. Normally they are partition keys to your tables.
  • Test tables performance against data, using automated tools(like JMeter).

Not to do:

  • Try to reproduce any SQL features into Cassandra(Joins and index are not recommended!)
  • Care too much about data replication on tables. This will happen frequently.
  • Try to model right on the first try. The model should be evolved during the whole Development process.

2 Replies to “Migrating Relational DB to Cassandra Modeling (How-to)”

  1. A very good article, hope it is as easy as you suggest to migrate a relational database to a NoSql, you make it sound so easy 🙂 !

    Just some questions that come to mind.

    1) How would you suggest tackling the problem of no referential integrity, the lack of ACID properties and the limited querying options for retrieving data that are present with a lot of these NoSql databases?

    2) Why choose Cassandra over any one of the many types of NoSql databases out there?

    1. Thanks for the feedback. Follow the answer to the questions:
      1) There is no easy solution for that, and Cassandra is not all cases solution. In Denormalization process, you bring the keys from the tables that you referer to the table that you are modelling as partition keys. So will end up having different tables depending on what you are searching for. Ex. location_by_date, location_by_user, location_by_name. To join data you can have a map-reduce process or use a Memory database solution like Ignite. Even considering that Cassandra supports Transactions, the normal solution doesn’t include ACID properties, and it is based on Eventual Consistency. Normally you implement a function to automatically undo any changes on your database after a failure, but the function can be different depending on the business process.

      2) One main advantage of Cassandra is that it doesn’t have a single point of failure as it is purely decentralized. Other advantages are that support CQL(an SQL like language, that makes easier to migrate from Relational word), column-oriented, big data ready, high availability, and scalability. Those 2 are available in other NoSQL solution like MongoDB as well. Cassandra is used by big players on the Internet like Facebook and Netflix.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.