From other posts you should know that we’re now working with Cassandra, is a nice decentralized way of storage data but, power is nothing without control.
So in order to keep things under control, we want a way to monitor what’s happening under the hood.
– Linux machine
– Working Cassandra Database
– Know what is Grafana & Prometheus
– Just some of your time
– A picture is worth a thousand words:
Cassandra exposes all kind of information through JMX, and we can access it directly, but is a little bit complex… and well, is ugly too. So, we opted for a even more complex but beautiful solution.
First let me show you a diagram about what we want to achieve:
In summary, we’re going to send the information that Cassandra brings to us through JMX to ‘Prometheus’ using a java agent and we will do the same with the node information (CPU%, memory, network status, etc…) but in this case we will use a Go program called Node Exporter.
Lastly, Prometheus will expose the information to our dashboard, in this case Grafana.
Careful here, we had problems with Cassandra’s version, for achieve this step, the version must not be 3.09, I tested with 3.10 & 3.11 and in both version it worked.
Those steps are easy:
- Download jmx_exporter
- $wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.0/jmx_prometheus_javaagent-0.3.0.jar
- Download exporter default config
- $wget https://raw.githubusercontent.com/prometheus/jmx_exporter/master/example_configs/cassandra.yml
- Add the java agent to cassandra’s start
- $echo ‘JVM_OPTS=”$JVM_OPTS -javaagent:’$PWD/jmx_prometheus_javaagent-0.3.0.jar=8080:$PWD/cassandra.yml'”‘ >> pathtocassandra/cassandra-env.sh
- Check that everything is working
- $curl localhost:8080 (it will show a bunch of raw data)
Exposing Node info
Care again here, we don’t want the last version of Node_Exporter, there was some breaking changes on version 0.16 that will break our Grafana dashboards so, I strongly recommend to use version 0.15.2
- Download & untar node_exporter
- $wget https://github.com/prometheus/node_exporter/releases/download/v0.15.2/node_exporter-0.15.2.linux-amd64.tar.gz && tar -xzf node_exporter-0.15.2.linux-amd64.tar.gz
- Run node exporter
- $ ./node_exporter &
Gathering with Prometheus
- Download & untar Prometheus
- Configure Prometheus to scrap our data
- $ vi prometheus.yml #and fill it with this info:
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] scrape_configs: - job_name: 'cassandra-node1' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 15s static_configs: - targets: ['localhost:8080'] labels: group: 'Cassandra' - job_name: 'nodeExporter' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 15s static_configs: - targets: ['localhost:9100'] labels: group: 'Node'
We are assuming that you’re doing all this in the same computer (exporters + prometheus + grafene) if not, just change localhost when needed.
- Run prometheus
- $./prometheus &
- Download && install grafana
- $ wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana_5.1.0_amd64.deb && sudo dpkg -i grafana_5.1.0_amd64.deb
- Start the service
- $ sudo service grafana-server start
- Configure the Data Source
- Adding Dashboards
This is one of the best ways of view the info of our architecture in a single point, in this way we can check about how much CPU takes a single query or even upgrade the dashboards with our custom metrics, so indeed, to take a look to this it’s worth your time.