On the last article, we could see an overview about the front end architecture and how we can reduce the amount of requests to Cloud, reducing the network traffic and costs using the idea of Fog Computing.
The idea is to process data close to the devices. That’s why we choose the Fog Computing, so we can have a “Bridge” layer between the Cloud and the Edge, running inside devices like Raspberry Pi, Odroid, or better devices if the project requires more processor on the Fog.
Now you’re probably wondering: “But how will you maintain so many layers, and different codes? Is it worthy for maintenance? ”
Good question, my dear Padawan! What we are trying to do is bring small pieces of code from Cloud to run into the Fog. Same Code, same result. So think about it, we can bring pieces of Slow lane Processing to the Fog Layer:
Reading this image from left to right, you can see a User accessing an UI (mobile, tablet, computer, PWA). This UI only access its own Device Storage, nothing else.
Using bidirectional synchronization, the Device Storage will be synchronized all the time with a CouchDB on the FOG. So it doesn’t matter if the Cloud is off, or even the Fog is off, the devices will have its own storage so they can keep working offline.
The Cloud keeper (like a FOG API) will be a Local Web server that communicates with the Cloud keeping the Fog Database (CouchDB) always up to date according to the project rules. It will be responsible to consume the Apache Livy Service to start the Spark Jobs and updating the CouchDB with the result of each job.
When the Apache Livy receives a request from Cloud keeper, it will start a new Spark Job, where it uses the CouchDB on Fog Layer to launch the operations of business logic and rules, handling data to perform transformations, extractions etc. In the form of a JAR file, using the same code for Cloud and Fog.
We are going to use the CouchDB as a subset of cloud data, syncing and merging all the data from the devices. We can replicate this storage as much as we want. Keeping them all synchronized (we will talk about the data structuring soon).
Learning CouchDB should be natural to most every developer who has being doing any work on Web development. The core is simple but powerful.
It uses Document data Stores (like MongoDB), very close to relational databases and provide a JSON data structure, using hierarchy avoiding table joins. It is not so fast as MongoDB, but the Master Key of CouchDB is SYNCHRONIZATION.
This is beautiful, we easily replicate on multiple server instances using the Couch Replication Protocol. So yes, it allows us to keep the FOG’s data synchronized with EACH storage on Edge Layer (we going to talk about the Edge details soon).
The Couch Replication Protocol allows our data to flow seamlessly between server clusters to mobile phones and web browsers. It comes with developer-friendly query language, mapReduce (optional), and some adapters to migrate to another databases like SQLite, Mongo, or even custom apdaters.
Livy is an open source Apache licensed REST web service for managing long running Spark Contexts and submitting Spark jobs. It is a joint development effort by Cloudera and Microsoft.
Instead of running the Spark Contexts in the Server itself, Livy manages Contexts running on the cluster managed by a Resource Manager like YARN.
Some features included on Livy (Official documentation):
- Have long running Spark Contexts that can be used for multiple Spark jobs, by multiple clients
- Share cached RDDs or Dataframes across multiple jobs and clients
- Multiple Spark Contexts can be managed simultaneously, and the Spark Contexts run on the cluster (YARN/Mesos) instead of the Livy Server, for good fault tolerance and concurrency
- Jobs can be submitted as precompiled jars, snippets of code or via java/scala client API
- Ensure security via secure authenticated communication
Usually when we create some app, we use more than one way of communication, like one communication to database, one using REST to an API, another one to Sensors, and so on. What we are proposing is only one communication: Only the database. So the database will be self-responsible to keep its own data up to date. It is a new paradigm introduced by this concept of “Offline First”.
With this architecture, we could separate the responsibilities of each component. The FOG is only responsible to create a subset of data that will be used on the Edge devices. With these technologies, we could save a lot of efforts on security, cluster and job controls, synchronization and data conflicts management.
The next articles we will see the data structure organization, how we going to organize the data and secure it. We will see the edge layer organization with the chosen stack.