[Cloud]/STF

Red Hat STF(Service Telemetry Framework)

ByoungHee Lee 2021. 10. 7. 21:18

STF overview

Red Hat STF(Service Telemetry Framework) is a system monitoring and alerting framework that collects monitoring data such as metrics and events that are generated from the infrastructure and VM resources running on Red Hat OpenStack Platform or third-party nodes. The first GA release of Red Hat STF is OSP(OpenStack Platform) 16.

Necessity of STF

As the workload of the target systems and applications to be monitored increases, service continuity becomes more important. As a result, a monitoring solution that can effectively monitor the target system without burden or impact on the target system is increasingly required. In addition, if the monitoring server system co-exists in the monitoring target infrastructure, it will affect the service and production environment. Therefore, there is a need for an architecture that separates the client and server sides and a monitoring solution that can effectively deliver data with a fast and stable message bus.

Features

  • Store or archive the monitoring data for historical information
  • View the monitoring data on the dashboard
  • Use the monitoring data to trigger alerts or warnings

Monitoring data

There are two types of monitoring data collected from applications and systems:

  • Metrics: a numeric measurement as time-series data identified by metric name and key/value pairs
  • Events: irregular and discrete occurrences that happen in a system, which triggers alerts or warnings to notify the administrator

STF Architecture

The following diagram shows the flow of how metric and event data can be generated from OpenStack and flowed into and stored in OpenShift.

Figure 1. Service Telemetry Framework Architecture Overview

 

The following diagram shows multiple Red Hat OpenStack Platform clouds that are monitored and collected by one STF(Service Telemetry Framework) 

Figure 2. Two Red Hat OpenStack Platform clouds connect to STF

On the client-side, Collectd plugins deliver infrastructure metrics and events to AMQ Interconnect, and Ceilometer Agents deliver metrics and events of projects and user workloads to AMQ Interconnect running on Red Hat OpenStack Platform.
On the server-side, Smart Gateway takes the data from the AMQ and delivers the metrics to the Prometheus, and events to ElasticSearch Datastore running on OpenShift Platform.

Components

STF(Service Telemetry Framework) consists of multiple components described in the following:

  • AMQP: a messaging bus to shuttle the metrics to STF in Prometheus
  • Smart Gateway: takes metrics and events from AMQP and deliver metrics to Prometheus and events to ElasticSearch
  • Prometheus: stores its metrics as times series data with the timestamp
  • ElasticSearch: stores events data
  • Collectd: collects infrastructure metrics and events on the client-side
  • Ceilometer: collects Red Hat OpenStack Platform metrics and events


STF on OpenShift

Administrators can use operators to run STF core components and objects on OpenShift. The following OpenShift dashboard console shows the installed Operators.

  • Core components managed by Operators: Prometheus and AlertManager, ElasticSearch, Smart Gateway, AMQ Interconnect, Grafana

Figure 3. Installed Operators for STF

Dashboard

Dashboards that can be imported into Grafana and customized as needed. Administrators can not only integrate 3rd-party dashboards like Grafana based on STF in OpenShift Platform, but also they can easily modify YAML templates for them to see it to be viewed for their purpose.  Grafana provides a variety of views such as infrastructure view, VM view, cloud view by importing customized YAML templates.

Infrastructure View:
provides monitoring of the system nodes of OpenStack Platform.

Figure 4. Infrastructure View of monitoring OpenStack Platform

It also provides alerts that the administrator sets -- on the alert rules in Prometheus and alert routes in Alertmanager.

Figure 5. Aerts in Prometheus

VM(Virtual Machine) View:
provides monitoring of VM resources of the project running on OpenStack Platform.

Figure 6. VM View for each project 

Conclusion

Now STF would be an effectively proper monitoring solution to monitor multi-cloud environments like OpenStack Platform at scale. With the proper design, plan, and experience for STF, the operational administrator could monitor the multi-cloud OpenStack Platform better for their purpose.

 

Introduction to STF in 5 minutes

Check out this youtube video: