Following on from the benchmarking overview post, the following post will go into more detail on what metrics we are capturing, how we consume, and how we will intend to use them going forward.
How it Works
As described in the previous post, we need to exclude any network latencies from our benchmarks. To do this, we need to run the benchmarks locally on each test subject, i.e. Operational Database (ODB), Reading Database (RDB), Endpoint. This led us to design a Benchmark Client/Agent. The Agent resides locally on the test subject. It is a little HTTP Server that accepts POST requests. The body of the POST is the configuration details of the benchmark to run. For example, a POST to ODB would tell ODB the keyspace and table to query. It is the job of the Benchmark Client to consume the results. This includes writing results to CosmosDB, or outputting results to console/file.
The following is an overview of how it looks:
The agent will open up an endpoint for each of the tests, e.g. /odb, /rdb, and /endpoint.
The following are the high level metrics we are interested in:
These metrics will allow us to logically group our metrics for reporting on afterwards. For example, we can report on metrics for UK (business), POS (Area), and rates (Flow). The timestamp is just to know when the benchmark was taken. This will allow us to view the performance over time, or even just view the most recent.
The following are the metrics per test:
- Test – e.g. ODB, RDB, Endpoint
- Type – e.g. Read/Write for database, GET/POST for endpoints
- Resource – e.g. the table name for ODB/RDB. The URL for an endpoint
- Count – count of the load put on the Test, e.g. 100 requests, 1000 requests.
- Average – This is the important metric. The average time it took to complete the benchmark.
- Min – the minimum time it took to respond
- Max – the maximum time it took to respond
- Errors – how many of our requests returned an error
We can use these metrics to display the results visually using Power BI. The following is an example of what we can show. The first image is an example of drilling down into an ODB test for a read operation. We can see that there are some spikes in terms of the max time it took to respond, but ultimately the average is much closer to the minimum.
The following is comparing the Read operations from RDB verses ODB. In this example, we can see that RDB greatly outperformed its ODB counterpart. We would expect this as the RDB is reading from memory.
The below snippet is an example of the configuration for the Benchmark Client. This is required to run the tests.
benchmark: business: "newpos" area: "pos" flow: "rates" persistResults: "true" odb: threads: 1 loops: 1 agentUrl: "http://localhost:8081/odb" tables: - keyspace: newpos table: current_rates_by_location limit: 100 rdb: threads: 1 loops: 1 agentUrl: "http://localhost:8081/rdb" tables: - keyspace: Currencies table: currencies limit: 100 endpoint: threads: 1 loops: 1 agentUrl: "http://localhost:8081/endpoint" resource: "v1/rates/ratePlanId" config: url: "" headers: x-location: "client-location" params: param1: "client-param-1" body: "title": "Mr" "name": "Client" httpMethod: "GET" protocol: "http" domain: "localhost" port: 8081 path: "v1/rates/abc123"
The first category “Benchmark” is used as an overview of what you will be benchmarking. The RDB and ODB configurations are both similar. They require the table information. Currently, the Client will automatically trigger a Read/Write benchmark for ODB. RDB defaults to Read only. The endpoint config takes all information required to build a request. Headers & Params are optional. The body is required for POST/PUT requests. The endpoint “resource” config is used for reporting on afterwards. This value should be a clean reference to the endpoint being tested, i.e. shouldn’t include any path values.
All of the above require an agentUrl configuration. This value is the URL for the Agent to send benchmark requests to.
Developers should be running the Benchmark Client as they create/modify any endpoints and their resources. Once you are happy with the metrics, these should be recorded and passed into the Endpoint Descriptor for the delivery platform. Remember that you will often be dealing with milliseconds, so give yourself some overhead with the final figures.
Integrate with the Delivery Platform. As part of an Endpoint Creation, Benchmarks should be run on the resources being used by the Endpoint. Should any of these Benchmarks report worse than what the developer has defined, the build should fail until the latency has been addressed.
We now have a tool that we can use to capture metrics. We have addressed issues such as network latency to make the benchmark results as clean as possible. The tool will really come into its own when fully integrated with the build process. We will generate a database of metrics that will allow us to analyse/identify any bottlenecks/latency in the system. This information will prove pivotal as the system grows.