You don’t just want to do know that your system is working, you want to know that your system is working well. Gathering performance metrics is essential in knowing this.
You can determine any changes you make have a positive effect. For example, query optimization at the data source level, you can determine whether your changes have improved that table’s performance. However, it doesn’t only apply to software. Environment changes can be easily tested too. Any changes/upgrades to an environment, you can gather metrics again to ensure you are making a step in the right direction.
Benchmarks can help you identify a creep in latency over time. You may see that part of the system isn’t performing as well as it used to be. At the database level, this may indicate a table growing too large. At a logic level, maybe a refactor is needed, or an upgrade to a technology.
The metrics can be used for Quality Assurance. If you automate your benchmark tests, then you potentially have a tool that can stop a build should the benchmark exceed the previously recorded metric (plus some contingency to account for latency such as network).
Performance metrics can help you identify potential bottlenecks in your system. Take the following as an example…
If you have performance metrics captured for each area, you can make an fair assumption as to where a bottleneck may exist. For example, you create a new endpoint. It takes 10 seconds to get a response. You know this isn’t good enough, but where do you start to identify the delay? You can start by checking the benchmarks. You see the database normally returns a response in half a second. Now you can safely assume that the issue is somewhere in the API Gateway or Business Logic. Your business logic normally executes in 2 seconds. Now, there is 7 seconds unaccounted for. The API gateway is the last piece of the puzzle. You can focus on this first to determine what went wrong with your request.
It can be used for comparing technologies. An obvious example here is Ignite verses Cassandra for read speeds. Ignite should be outperforming Cassandra, if it is not, there is something wrong.
Start with a clean slate. Get an early benchmark that you can compare against later. In the instance of a database, this would be the first draft of your schema. Benchmark against an empty table. The response should be mere milliseconds.
Next, test with real data. For a database, dump a realistic load of data into the table. Test again. How did the query perform now? Was there only a slight increase? Or did performance plummet?
You should next look to optimize your initial work. Make changes to the logic/schema. Test again.
Repeat the above until you are satisfied that with the results. At the end, you will have your benchmark.
It is important to be consistent when recording metrics. For example, when testing a read operation in a database, it is important to get roughly the same data every time. There is no point in testing a read operation that returns 100 rows today, and 100,000 rows next week. This will obviously have a major impact on results. You either have to ensure the integrity of the data before hand, or run a static query, e.g. select first n rows.
Where you run the tests from must be consistent. There are so many factors that could play a part here, the spec of the machine running the tests, the location of the machine (network latency). It is OK to run the tests locally during development, but final benchmarks must be taken from a central location.
Your metrics should be recorded in a manner to match your data structure. This will make it easy to query later. As you may have seen in previous posts, the following is the structure we will record:
Along with this, we record what we are testing, e.g. Cassandra, Ignite, Endpoints. And what type of test, e.g. read or write.
In terms of test figures, ultimately, the response time is the figure we are interested in. You should keep a history of the benchmarks. This will allow you to view metrics over time. You can also extract valuable information such as min, max, or averages.
This post has delved into the theory behind benchmarking performance metrics, why it is needed, and recommended procedures in how to get them. I will follow up with some posts on the technologies out there that can be used for this, including the Cassandra stress tool, and JMeter.
Thanks for reading!