Metrics Concepts
Definition
Metrics are quantitative measurements that provide insights into the performance and behavior of a system, application, or infrastructure. In the context of observability, metrics typically represent numerical data points collected over time.
These data points can include various types of information, such as resource utilization (CPU, memory), request counts, error rates, and latency. Metrics are fundamental for monitoring and analyzing the health and efficiency of a system, helping teams identify issues, track trends, and make informed decisions to optimize performance and reliability.
In short: metrics are the backbone of any monitoring system, offering insights into the behavior of your systems and applications. In Prometheus, metrics represent a structured approach to represent monitoring data as time-series, which are streams of timestamped values that describe specific aspects of system behavior.
What is Prometheus?
Prometheus is an open-source systems monitoring and alerting toolkit. It was developed by SoundCloud and made open-source in 2012. Today, it’s a standalone open-source project and maintained independently of any company.
Official Prometheus Documentation
Prometheus has a well-maintained and robust official documentation which provides extensive information about installation, configuration, querying and more. You can refer to it for better understanding and usage of the tool.
Core Principles of Prometheus
Prometheus's design revolves around several key principles:
-
Multi-dimensional data model: Prometheus stores all data as time-series, i.e., streams of timestamped values belonging to the same metric and the same set of labelled dimensions, enabling a diversity of queries.
-
It boasts a powerful query language, called PromQL, which allows you to select and aggregate time series data in real-time.
-
Pull-based metric collection: Instead of relying on sending monitoring data, Prometheus collects or “scrapes” metric data at regular intervals over HTTP from the application's endpoints.
-
You can use multiple modes of graphing and dashboarding to visualize metrics data through its built-in expression browser.
-
It has a deeply ingrained alerting mechanism that works with its data storage to generate notifications based on flexible alert rules.
-
Prometheus is highly reliable with each server acting autonomously, there is no need for distributed storage.
Architecture of Prometheus
Prometheus architecture is relatively simple. Applications expose an HTTP endpoint (often /metrics
), and the Prometheus server scrapes these endpoints at a regular interval, storing the information as time-series data.
This data can be queried via an API and visualized with a UI like Grafana or the built-in expression browser. Prometheus's alert manager allows the creation of sophisticated alerts, anticipating potential issues and responding effectively to system changes.
Finding More Information About Prometheus
Given its widespread usage, there's a wealth of documentation available for Prometheus. The official Prometheus documentation is an excellent place to start, providing an overview of concepts, detailed guides, and best practices. For advanced topics, there are numerous blogs, tutorials, and talks available online.
The Prometheus GitHub repository is also a good place to visit for those interested in Prometheus' latest development updates or those looking into contributing to the project.
Remember, understanding and optimizing the use of Prometheus is an ongoing process. Keep seeking out and researching information, and don't hesitate to consult the community or professional literature if you hit any roadblocks.
Existing tooling
For many existing systems like database servers, web servers and application stacks, there are ready made tools to export metrics. Take a look at the official Prometheus documentation that contains a (non-exhaustive) list of available exporters.
If no existing exporter seems to exist, take a moment to look at the documentation to see if Prometheus support is built-in, and may be exposed by changing the configuration.
When using existing tooling, take a moment to see what metrics and how many series are generated in your setup. There may be more configuration required if the amount of metrics is too low, or too high.
Types of Metrics
Prometheus identifies four primary types of metrics. Extensive documentation is available on all of these in the official Prometheus documentation
Counter
A counter represents a cumulative metric that frames a single, monotonically increasing count or sum. This could include the number of requests served, tasks completed, or errors produced.
Gauge
A gauge represents a single numerical value that can arbitrarily go up or down; think of it as taking a snapshot of a system state. Examples include current memory usage, the temperature of a server room, or the number of active requests.
Histogram
A histogram gathers observations (like request durations or response sizes) and sorts them into configurable buckets. It allows you to count observed values falling into each bucket to analyze the distribution of your data.
Summary
Similar to a histogram, a summary collects observations over a sliding time window. However, it provides additional information, like the total count and sum of data, and configurable quantiles.
Naming metrics
When creating Prometheus metrics, it's essential to adhere to certain naming conventions to ensure consistency and compatibility.
Remember that clear and consistent metric naming is crucial for effective monitoring and querying in Prometheus. Well-named metrics make it easier to write queries and create informative dashboards and alerts.
Prometheus provides a set of best practices for naming your metrics.
If you mostly use existing exporters and instrumentation, metrics have already been named for you, and the only choice you might need to make is if you want to prefix a custom namespace name to differentiate your metrics from other metrics with the same name.
Labels
Metrics are defined by their unique name, but all metrics require labels to add more dimension and context to their values. For instance, labels can be used to categorize metrics by endpoint, status code, or other dimensions, enabling more precise analysis and alerting.
For instance:
http_requests_total{app="ourapp”} 123
will tell you this value is specific the app named "ourapp".
Cardinality
All unique combinations of a metrics and its labels are stored on our platform as unique series. Take this example:
http_requests_total{app="ourapp”, server="app1”, status=”200”, handler=”/api/v2/foo”} 509
If we store this metric for four apps, six servers, four statuses and twenty handlers, the end result will be 1920 series. This amount is not unusual, but care must be taken that this amount of series for a single metric is intentional, as each series actively receiving metrics counts towards your monthly usage.
Making metrics usable
Each system or application to be monitored by Prometheus exposes an HTTP endpoint (usually '/metrics'), which provides the current value of all its metrics. Prometheus servers 'scrape' this exposed data at regular intervals, storing them as time-series data for later analysis. You can use Prometheus's query language, PromQL, to filter and aggregate this data to create alert rules or visualize the data.
Further Reading
Metrics play a vital role in any monitoring or observability strategy, and a thorough understanding of how they work in Prometheus can greatly improve your overall monitoring efficiency. For more in-depth details, refer to the official Prometheus documentation, which is a comprehensive resource providing a wealth of details about Prometheus metrics and their utilization. It includes everything from the basics of starting with Prometheus, its core concepts to detailed guides for each metric type.
If you aim to become proficient in using Prometheus metrics to monitor your applications, spending time with the Prometheus Querying documentation is invaluable. It gives a complete rundown of how to use the PromQL to filter, calculate, and aggregate the time-series data you collect.
When creating and exporting your own metrics, see the Prometheus Instrumentation documentation for the available client libraries for your favorite programming langauage.