An introductory view into Prometheus

Posted on July 15th, 2021 in Miscellanea by George Bochileanu Parfenie

Prometheus is an Open Source software developed by SoundCloud, commonly used in production servers as a monitoring and alarming tool.

To run this software, one has to go to the official repository and choose between several installation options such as Precompiled binaries, Docker images, Building from source, and Building the Docker image. For a more in-depth installation procedure, consider following the steps stated in this link.

This software has the following parts:

Metrics collection & querying. Prometheus stores metric values along the time in a database and enables for querying such information. The database used is a Time Series Database,
an optimized database for time series data or timestamped data. Time series data are essentially events or measures tracked, monitored, and aggregated over time.
Exporters. These are applications running on nodes that ingest data from several sources and transform them into metrics that Prometheus can collect and process.
AlertManager. It is an administration system that provides a way to watch the metrics values automatically.
Client libraries. These provide an easy way to instrument an application; by using these libraries, you’ll be able to export events to Prometheus from your code that will give you an insight into the portion of code that is getting executed at that moment.

We’ll get onto more detail in the following sections:

Metrics collection & querying

At this point, you may wonder what exactly is a metric. A metric is a quantifiable measure used to track and assess the status of a specific process.

You can find several metric types in detail here.

Some examples of metrics could be CPU usage or RAM usage. Metrics take a value at points in time and get stored in the Time Series Database.

To do queries on the collected data, you will have to use Prometheus’s Query Language, also known as PromQL. You can have a look at its documentation to find more about it.

You can perform these queries from the Prometheus server in an easy way. Although, when you’ll design experiments, you will need to do these queries from your code. To do so, Prometheus has an API for that, yet there is scarce information about official clients designed for that API. To solve this problem, in our Bachelor thesis, we have created a client in Java that can carry out this task. You can find the code in this GitHub repository.

Running example

To illustrate the capabilities of Prometheus, we will use an illustrative example of a CPU metric. If we would like to see the CPU time series, we would have to head to the Prometheus server web page, usually, http://dtim:9090, if we are running Prometheus on the dtim node, and we’ll type in the search bar an expression to show our metric:

sum by (instance) (node_cpu_seconds_total)

This expression shows the total number of seconds the CPU was running since each node started for all nodes in a cluster. We can evaluate this expression before and after running the target program if we would like to know how long a program has been running. The difference obtained would reflect the amount of time the target program has been running. Of course, going to the web page is a manual step, so the way to automatize this process would be to use the Prometheus querying API, mentioned before. An example of usage of that API is detailed as follows:

package com.testbed.interactors.monitors;

import com.testbed.entities.invocations.InvocationPlan;
import lombok.RequiredArgsConstructor;
import java.util.concurrent.Callable;

@RequiredArgsConstructor
public class CPUTotalTimeMonitor implements Monitor {
	private static final String MONITOR_NAME_PREFIX = "node";
	private static final String MONITOR_NAME_SUFFIX = "CpuTimeTotalInSeconds";
	private static final String QUERY = "sum by (instance) (node_cpu_seconds_total)";
	private final InstantMetricsDifferencesCalculator instantMetricsDifferencesCalculator;

	@Override
	public MonitoringInformation monitor(final Callable<MonitoringInformation> callable,final InvocationPlan invocationPlan) {
		return instantMetricsDifferencesCalculator.calculate(InstantMetricsDifferencesCalculatorParameters.builder()
			.monitorNameParameters(MonitorNameParameters.builder()
				.monitorNamePrefix(MONITOR_NAME_PREFIX)
				.monitorNameSuffix(MONITOR_NAME_SUFFIX)
				.build())
			.callable(callable)
			.query(QUERY)
			.build());
	}
}

This gist is from this repository and, it shows how to do the same query programmatically using the library that we have created in our Bachelor thesis.

If we now try to execute this query on Prometheus, we will not get any time series data. It is because we have not set up a way for Prometheus to collect data. To solve that, we can use exporters, which we will explain in more detail in the next section.

Exporters

Exporters are tiny programs that take data and export it to Prometheus, hence the name. There are a plethora of exporters available here, which range from node exporters (CPU usage, network usage, disk written or read bytes count) to framework exporters for Spark or MapReduce.

How do they work?

Let’s suppose we have a CPU temperature sensor installed in a node that we want to measure along the time. The procedure to write an exporter would consist of a loop that collects the sensor’s value and stores it. Then it exposes this value to an API endpoint. The GET path should return something similar to this:

# HELP oracledb_context_no_label_value_1 Simple example returning always 1.
# TYPE oracledb_context_no_label_value_1 gauge
oracledb_context_no_label_value_1 1
# HELP oracledb_context_no_label_value_2 Same but returning always 2.
# TYPE oracledb_context_no_label_value_2 gauge
oracledb_context_no_label_value_2 2
# HELP oracledb_context_with_labels_value_1 Simple example returning always 1.
# TYPE oracledb_context_with_labels_value_1 gauge
oracledb_context_with_labels_value_1{label_1="First label",label_2="Second label"} 1
# HELP oracledb_context_with_labels_value_2 Same but returning always 2.
# TYPE oracledb_context_with_labels_value_2 gauge
oracledb_context_with_labels_value_2{label_1="First label",label_2="Second label"} 2

To make this exporter visible to Prometheus, you have to edit the prometheus.yml file as follows:

global:
  scrape_interval: 10s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

This configuration file states in the first row the scrape interval or the period between each time Prometheus pulls the data from the GET endpoint of your exporter. Consider keeping an eye on the Prometheus server metrics collection impact on resources consumption for low scraping intervals. It is the case if you are collecting CPU usage or RAM usage, for example, so make sure to set a scraping interval of seconds or if you desire a more fine-grained metrics collection, follow the before mentioned piece of advice.

Each exporter is considered a “job” by Prometheus, so for each exporter, you will have to declare a job in the same fashion as the “prometheus” job. In the static configs property, you can define the target. Notice that this address is composed of a host and port. First off, you have to ensure that the host has that port open if you have any firewall activated. Notice also that the Prometheus server and the exporter nodes don’t necessarily have to run in the same server; so you might have a distribution as in the following diagram:

Prometheus distribution diagram

As we can see, there is a Prometheus server in charge of the job polling. You can have a dedicated server whose job is only executing the Prometheus server, or you can reuse the master node of a cluster of nodes as this server, as we did in our investigation. As mentioned before, bear in mind the impact of low scraping intervals and decide accordingly which option suits you better.

Once the configuration is stored, all you have to do is restart the Prometheus service, in case you use it as a service, or restart the server manually; Prometheus will detect the new exporters in the next run.

Running example

Following our previous example, we will have to do a couple of things: firstly, we have to run an instance of the node_exporter in each node that we want to monitor. Secondly, we will have to configure the Prometheus server to be able to recognize the node_exporter. We can do it as described before using the following prometheus.yml configuration file:

global:
  scrape_interval: 1s

scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['dtim:9100', 'dtim1:9100', 'dtim2:9100', 'dtim3:9100']

Now we can know the number of seconds that the CPUs have been running in all servers. If we know that our execution shouldn’t last more than 15 minutes of CPU time, we can either check manually or programmatically that this requirement is satisfied. If we want to automatize this process, we can leverage the AlertManager component, which we will further describe in the next section.

AlertManager

You can use this component to keep an eye on a specific metric while your experiments are running. An example of a use case would be when your experiments might be running over a long time, and you don’t want to spend your time monitoring that any metric has a value within some range. An example is that if the cluster runs out of disk space and you don’t want it to ruin your experiments, you can set up a threshold. The Prometheus server can send an email to alert you if the metric value goes beyond the set threshold.

You can find more detailed information about this feature in its official documentation.

Running example

Following our previous example, to make Prometheus take care of alerts, first of all, we have to download the AlertManager from here.
There is an alertmanager.yml configuration file that should have the following contents:

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: '<URL>'

Where <URL> is the webhook URL used by the AlertManager to send an email. We will not get into that in this article since it is not the main focus. Nonetheless, if you want to know more about the various alerting options, check this video, which shows an example of usage.

Once we run the AlertManager, for example, in dtim we can see the web page visiting http://dtim:9093. To integrate AlertManager and Prometheus, we have to add the following to the prometheus.yml configuration file:

alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - 'dtim:9093'
rule_files:
   - alert_rules.yml

Where the alert_rules.yml are the rules used for alerting and, these rules should look as follows:

groups:
- name: alert_on_cpu_time_running_program
  rules:
  - alert: CPUTimeRunningProgramGreaterThan15Minutes
    expr: job:client_library:cpu_time_running_program_in_minutes > 15
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: CPU time running program is greater than 15 minutes

This example is an adaptation from this video. You can check it for a hands-on tutorial on how to use the AlertManager.

This alert could help save resources by letting the operator know by sending an email that there is a problem executing an experiment. There is more information on alerting rules here.

You may wonder why we did not use directly the node_cpu_seconds_total metric issued by the node_exporter. It is because we cannot always suffice our needs only with the exporters. Therefore, we might need to emit metrics that are aware of the start and end events of the target program from within our code. For that, we will introduce in the next section the client libraries.

Client libraries

Once you have created an application, you can emit specific events from your application using these libraries. These libraries help you know, for example, in which phase your experimentation tool is at a glance instead of having to scroll through logs. These libraries offer an easy way to instrument your application.

The libraries are available in several programming languages and have official support. You can find more information on these here.

Running example

Finally, if we want to complete our running example, we will need to export the metrics used by the AlertManager component. We could use a Gauge metric, which allows us to increase and decrease the value of a metric. We can define it from within our experimentation tool as follows:

package com.example.monitors;

import java.time.Duration;
import io.prometheus.client.Gauge;  
import lombok.RequiredArgsConstructor;

@RequiredArgsConstructor
class ExecutionTimeMonitor {
  private static final Duration ONE_MINUTE = Duration.ofMinutes(1);
  private static final Duration HALF_MINUTE = Duration.ofSeconds(30);
  
  private final ProgramStats programStats;

  void monitor() {
    Gauge cpuTimeRunningProgram = Gauge.build()
        .name("cpu_time_running_program_in_minutes")
        .help("CPU time running program in minutes.")
        .register();
    Duration executionTime = Duration.ZERO;
    while(programStats.isProgramRunning()) {	
        cpuTimeRunningProgram.set(executionTime.toMinutes());
        Thread.sleep(HALF_MINUTE.toMillis());
        executionTime = executionTime.plus(HALF_MINUTE);
    }    
  }
}

This monitor should execute in a separate thread and, we will have the target program statistics (programStats) injected when instantiating the class. The monitor method will be in charge of updating the monitor state. So far, we have an updated monitor, but this example won’t emit anything to Prometheus.

To make this code export data to Prometheus, we will have to use the PushGateway. It is a way among others to export data to Prometheus; check this for a complete list of methods. If we suppose that we have an instance of the PushGateway running on dtim, we can complete our code as follows:

package com.example.monitors;

import io.prometheus.client.CollectorRegistry;                  // Added
import java.time.Duration;
import io.prometheus.client.Gauge;  
import io.prometheus.client.exporter.PushGateway;               // Added
import lombok.RequiredArgsConstructor;

@RequiredArgsConstructor
class ExecutionTimeMonitor {
  private static final Duration ONE_MINUTE = Duration.ofMinutes(1);
  private  static  final Duration HALF_MINUTE = Duration.ofSeconds(30);
  
  private final ProgramStats programStats;

  void monitor() throws Exception {
     CollectorRegistry registry = new CollectorRegistry();     // Added
     Gauge cpuTimeRunningProgram = Gauge.build()
	.name("cpu_time_running_program_in_minutes")
	.help("CPU time running program in minutes.")
	.register(registry);
    Duration executionTime = Duration.ZERO;
    PushGateway pushGateway = new PushGateway("dtim:9091");    // Added
    while(programStats.isProgramRunning()) {	
        cpuTimeRunningProgram.set(executionTime.toMinutes());
        pushGateway.pushAdd(registry, "client_library");       // Added
        Thread.sleep(HALF_MINUTE.toMillis());
        executionTime = executionTime.plus(HALF_MINUTE);
    }    
  }
}

To make Prometheus collect data from the PushGateway, we should edit the prometheus.yml configuration file again, adding the following lines under the scrape_configs section.

- job_name: 'push_gateway'
  static_configs:
    - targets: ['dtim:9091']

Now we should be able to see the exported information on the Prometheus web page, visiting http://dtim:9090 and typing the following expression in the search bar:

cpu_time_running_program_in_minutes

Additional information

If you want an introductory hands-on tutorial on how to set up Prometheus, you can have a look at this YouTube video.
When experimenting, it might be more valuable to use Prometheus managed by AWS since the price of maintaining a node powered on can be higher than the cost of using a managed service.
If you want an example of how we used Prometheus in our experimentation, consider checking this GitHub repository.
If you want to monitor the state of the cluster in real-time, the Prometheus server provides a rather basic way to graph metrics, yet if you would like to produce more detailed graphs, consider using Grafana.