Low-Latency Data Operations with Apache Ignite as a Cluster

Why you write this?

I just finished all our distributed data environment and woah’ guys, it was hard!

Let’s do a quick summary about what I did and in the end, let share an useful tool (at least for my purposes) for all of you!

First of all, you should know that we’re using a CQRS pattern, which is easy explained with the following picture:

Which means that any Spark application developed by one of our coders must read the data from our Apache Ignite cluster, this cluster will gather the data from Cassandra, but if he wants to save data, the app will persist it directly in Cassandra, without any interaction of Ignite, so in this way the Read & Write operations are separated.

CQRS segregate the read/write models, decoupling’s the component dependencies with the data operations, bringing us the needed flexibility in order to have (real) micro-services which just do one thing.

As per read in other posts of me colleagues (Hi Jesús) we’re using an Event Sourcing approach so this pattern will fit perfectly in our environment.

Why ignite?

Following the official documentation there’s a ‘easy’ way to use Cassandra as a persistent store for ignite, as the image shows:

In the original image it shows ‘Write Through’ too, but we’re on CQRS right?!

To have all our data inside a cluster which keep the information in memory sounds pretty nice… work with memory stuff is always fast, we’re talking about real time communication which brings to us a ridiculous latency, but, what if all this structure is distributed across the world in different countries and served by demand… sounds even better, right? And lastly… imagine that you can access your NoSQL (remember that we’re using Cassandra) data using SQL without a huge impact in our infrastructure (yes, join data and all that normalized stuff that everybody likes).

Say good bye to deadlocks, thanks to the decentralized nature of Ignite deadlocks are almost inexistent.

With Ignite all your database data will be spared across different highly scalable nodes,just in the same way of Cassandra.

Let’s see a quick metrics that we gather running the following Query in both systems, using loops of 20 in 10 different threads using  JDBC:

SELECT * from CUSTOMER_BY_NAME_AND_BIRTH.CUSTOMERBYNAMEANDBIRTH LIMIT 10000;

 

As we can see, Ignite is more than 10x faster than Cassandra for this operation… well it not needs further explanation, but 104 milliseconds is like 1/3 of an eye blink 😉

All this official documentation it talks about how to configure Ignite to generate all the stuff (tables, etc…) in Cassandra in order to have this ignite working… but what if I have my own working Cassandra database already  and i want to use ignite over it? Well… in this case you will need  to generate all the configuration/POJOs by yourself for every Cassandra’s table which, to be honest, is a pitty.

Save us Mr. Handsome

And here we go! I coded a little tool in Golang which will use nodetool to ask Casandra for a specific keyspace, grab all the structure of every table and generate all the files needed by ignite, but (there’s always a but) this tool just work with some primitive data types, which means that your Cassandra’s tables will not use UDT or any other strange types for the moment it’s supporting the following:

    javaTypeMap[“int”] = “Integer”
    javaTypeMap[“uuid”] = “UUID”
    javaTypeMap[“text”] = “String”
    javaTypeMap[“timestamp”] = “Date”
    javaTypeMap[“float”] = “Float”
    javaTypeMap[“boolean”] = “Boolean”
    javaTypeMap[“decimal”] = “BigDecimal”
This is because ignite does not have all the codecs for all kind of data types, for example ‘Date’, Ignite it will not work if Cassandra have some table with the Date datatype, you should change it for timestamp, taking in count this considerations, this tool will help you to create your Ignite node.
The tool is available here.

Example of use

Having the following Cassandra’s table:

CREATE TABLE somekeyspace.location_by_group (
group_id uuid,
location_id uuid,
active boolean,
name text,
PRIMARY KEY (group_id, location_id)
)

And configuring the tool with your own properties

cmd := exec.Command(“/home/user/Develop/git/apache-cassandra-3.11.2/bin/cqlsh”, “localhost”, “-e”, “DESCRIBE somekeyspace”)

The first parameter should be where you have cqlsh installed, localhost should be the IP of your Cassandra database, and lastly, ‘somekeyspace’ should be the keyspace that  you want to use (somekeyspace in this case).

Once executed with ‘go run main.go’  we will get the following:

  • LocationByGroup.java
  • PK/LocationByGroupPK.java
  • client/ignite-config.xml
  • server/ignite-config.xml
  • Starter.java

The Java classes are the POJO and the primary key, the client & server Ignite-config are the configuration files for the ignite client node and the ignite server node and the Starter class is just the main class which will run ignite and load  the generated cache.

LocationByGroup.java

public class LocationByGroup {
@QuerySqlField(index = false)
private UUID groupId;

@QuerySqlField(index = false)
private UUID locationId;

@QuerySqlField(index = false)
private Boolean active;

@QuerySqlField(index = false)
private String name;

{getter&&setter stuff}

LocationByGroupPK.java

public class LocationByGroupPK {
@AffinityKeyMapped
private UUID groupId;

private UUID locationId;

We can see how all the Cassandra’s table columns has been translated to java attributes in this files, keeping cluster/partition keys and other relevant stuff…

The configuration file will look like this

 

Which contains all needed definitions in order to make this work! Just copy everything in your Ignite spring project, change all the generate a jar file and make it run in Kubernetes!

If you want to run it locally, just remove this property discoverySpi from the xml config.

2 Replies to “Low-Latency Data Operations with Apache Ignite as a Cluster”

  1. Hi Daniel, i have worked with Cassandra for 3 years and it is one the best NoSQL databases today. It can achieve high volumes of throughput. It has some read limitations depending on how you model your tables but we can overcome this. I’m curious to understand why you chose to use an In Memory DB + Cassandra ? Other thing, i agree that Cassandra may not work well with this query “select *” but the thing we should ask is if we should use Cassandra for that kind of queries.

    1. Hi Diego!
      As I explain in the first part of the post, we want to follow a CQRS pattern so we must decouple reading and writing and keep them in separate ways, doing this, we are sure that long time queries won’t block our system, also as shown in the metrics, memory database bring to us very low latencies.

      In the other hand, ‘select *’ was just an example, not even going to do ‘select’ in Cassandra due to CQRS stuff.

      Thanks for reading us !

Leave a Reply

Your e-mail address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.