Observability with FastAPI, OpenTelemetry, Grafana, Loki, Tempo, and Prometheus
This documentation explains the setup of observability for a FastAPI application using OpenTelemetry for distributed tracing, logging, and metrics collection. The collected data is then exported to different backends for monitoring and visualization, including:
Logs are exported to Loki.
Traces are sent to Tempo.
Metrics are collected by Prometheus.
These backends are all integrated with Grafana for visualization and querying.
Overview of the Flow
FastAPI Application: The FastAPI app generates logs, traces, and metrics.
OpenTelemetry Collector: OpenTelemetry is responsible for exporting logs, traces, and metrics to the appropriate backends.
Backends:
Loki: Collects and stores logs.
Tempo: Collects and stores traces.
Prometheus: Collects and stores time-series metrics.
Grafana: Connects to all of the above systems and provides a unified dashboard for visualization and querying.
How It Works
Logs: FastAPI uses the OpenTelemetry SDK to capture logs and sends them to the OpenTelemetry Collector, which forwards them to Loki for storage.
Traces: OpenTelemetry also collects trace information from the FastAPI app, which is forwarded to Tempo for visualization.
Metrics: Metrics are generated by FastAPI and collected by Prometheus. These metrics can be customized by modifying the metrics.py file.
Grafana Dashboard and Integration
Grafana is used to visualize all of the observability data (logs, traces, and metrics). It connects to the following backends:
Loki: For logs.
Tempo: For traces.
Prometheus: For metrics.
Queries can be created within Grafana to visualize the data, track performance, troubleshoot errors, and analyze metrics over time.
Useful Links for Debugging and Access
Below are the important endpoints that can be accessed for debugging and visualizing data:
Grafana Dashboard:
http://localhost:3000: Access the Grafana dashboard where logs, traces, and metrics can be visualized.
Prometheus Time-Series Query:
http://localhost:9090/query: Directly query Prometheus for time-series data.
Metrics Endpoints:
http://localhost:9091/metrics: Metrics from the Lomas server.
http://localhost:9090/metrics: Other metrics exposed by Prometheus.
Tempo Trace Debugging:
http://localhost:55679/debug/tracez: Debug trace data from Tempo.
Loki Direct Access:
http://localhost:3100/ready: Check the readiness of the Loki service.
http://localhost:3100/config: View the current Loki configuration.
OpenTelemetry Collector:
http://localhost:13133/health: Health check endpoint for the OpenTelemetry Collector.
http://localhost:1777/debug/pprof/: Profiling and debugging endpoint for performance analysis.
http://localhost:55679/debug/tracez: Trace information for debugging traces.
Configuration Files and Customization
Prometheus Metrics Configuration
Some custom metrics for Prometheus are defined in the lomas/server/lomas_server/utils/metrics.py file. These metrics can be modified or new ones can be added as per the application’s requirements. This allows tracking of specific application-level metrics in addition to the default ones.
Logging and Tracing Middleware
The LoggingAndTracingMiddleware in lomas/server/lomas_server/routes/utils.py is responsible for logging incoming requests and adding the username (if available) as a span attribute. This helps trace and log user-specific activities, making it easier to monitor the actions of individual users across services.
Configuration Files for Observability
Examples of configuration files for observability, including settings for OpenTelemetry, Grafana, Loki, Tempo, and Prometheus, can be found in: lomas/server/configs/observability/.
These configuration files include the necessary parameters for connecting the FastAPI application to the respective observability systems.
Grafana Dashboard Configuration
To import a Grafana dashboard configuration, it must be exported or created as a dashboard JSON configuration file. This file is placed in the following directory: lomas/server/configs/observability/grafana/example_dashboard_config.json.
The dashboard should be exported as a JSON file and placed in the example_dashboard_config.json file. This will allow Grafana to automatically import and use the configuration for visualization.
Modifying Backends (Loki, Prometheus, Tempo)
Although Loki, Prometheus, and Tempo are the default backends, other systems for logging, metrics, and tracing can be used. No code modification will be required, but the configuration files in the lomas/server/configs/observability/ directory need to be modified to integrate new backends. For example, Loki can be replaced with Elasticsearch for logs or a different metrics exporter could be used.
Summary
This setup provides a robust observability pipeline for the FastAPI application, integrating logs, metrics, and traces into a centralized system for monitoring and debugging. Using Grafana, users can query and visualize data from Loki, Tempo, and Prometheus.
Key Points:
Logs, traces, and metrics are exported using OpenTelemetry.
Loki handles logs.
Tempo manages traces.
Prometheus collects time-series metrics.
Grafana provides a unified view and querying capability for these data sources.
Configuration for observability can be customized in the lomas/server/configs/observability/ directory.
Dashboard configurations are exported as JSON and placed in lomas/server/configs/observability/grafana/example_dashboard_config.json.