=============================================================================== Observability with FastAPI, OpenTelemetry, Grafana, Loki, Tempo, and Prometheus =============================================================================== This documentation explains the setup of observability for a FastAPI application using OpenTelemetry for distributed tracing, logging, and metrics collection. The collected data is then exported to different backends for monitoring and visualization, including: - **Logs** are exported to Loki. - **Traces** are sent to Tempo. - **Metrics** are collected by Prometheus. These backends are all integrated with **Grafana** for visualization and querying. Overview of the Flow -------------------- 1. **FastAPI Application**: The FastAPI app generates logs, traces, and metrics. 2. **OpenTelemetry Collector**: OpenTelemetry is responsible for exporting logs, traces, and metrics to the appropriate backends. 3. **Backends**: - **Loki**: Collects and stores logs. - **Tempo**: Collects and stores traces. - **Prometheus**: Collects and stores time-series metrics. 4. **Grafana**: Connects to all of the above systems and provides a unified dashboard for visualization and querying. How It Works ------------- - **Logs**: FastAPI uses the OpenTelemetry SDK to capture logs and sends them to the OpenTelemetry Collector, which forwards them to **Loki** for storage. - **Traces**: OpenTelemetry also collects trace information from the FastAPI app, which is forwarded to **Tempo** for visualization. - **Metrics**: Metrics are generated by FastAPI and collected by **Prometheus**. These metrics can be customized by modifying the `metrics.py` file. Grafana Dashboard and Integration ---------------------------------- Grafana is used to visualize all of the observability data (logs, traces, and metrics). It connects to the following backends: - **Loki**: For logs. - **Tempo**: For traces. - **Prometheus**: For metrics. Queries can be created within Grafana to visualize the data, track performance, troubleshoot errors, and analyze metrics over time. Useful Links for Debugging and Access -------------------------------------- Below are the important endpoints that can be accessed for debugging and visualizing data: - **Grafana Dashboard**: `http://localhost:3000`: Access the Grafana dashboard where logs, traces, and metrics can be visualized. - **Prometheus Time-Series Query**: `http://localhost:9090/query`: Directly query Prometheus for time-series data. - **Metrics Endpoints**: - `http://localhost:9091/metrics`: Metrics from the Lomas server. - `http://localhost:9090/metrics`: Other metrics exposed by Prometheus. - **Tempo Trace Debugging**: `http://localhost:55679/debug/tracez`: Debug trace data from Tempo. - **Loki Direct Access**: - `http://localhost:3100/ready`: Check the readiness of the Loki service. - `http://localhost:3100/config`: View the current Loki configuration. - **OpenTelemetry Collector**: - `http://localhost:13133/health`: Health check endpoint for the OpenTelemetry Collector. - `http://localhost:1777/debug/pprof/`: Profiling and debugging endpoint for performance analysis. - `http://localhost:55679/debug/tracez`: Trace information for debugging traces. Configuration Files and Customization ------------------------------------- - Prometheus Metrics Configuration Some custom metrics for Prometheus are defined in the `lomas/server/lomas_server/utils/metrics.py` file. These metrics can be modified or new ones can be added as per the application's requirements. This allows tracking of specific application-level metrics in addition to the default ones. - Logging and Tracing Middleware The **LoggingAndTracingMiddleware** in `lomas/server/lomas_server/routes/utils.py` is responsible for logging incoming requests and adding the username (if available) as a span attribute. This helps trace and log user-specific activities, making it easier to monitor the actions of individual users across services. - Configuration Files for Observability Examples of configuration files for observability, including settings for OpenTelemetry, Grafana, Loki, Tempo, and Prometheus, can be found in: `lomas/server/configs/observability/`. These configuration files include the necessary parameters for connecting the FastAPI application to the respective observability systems. - Grafana Dashboard Configuration To import a Grafana dashboard configuration, it must be exported or created as a dashboard JSON configuration file. This file is placed in the following directory: `lomas/server/configs/observability/grafana/example_dashboard_config.json`. The dashboard should be exported as a **JSON file** and placed in the `example_dashboard_config.json` file. This will allow Grafana to automatically import and use the configuration for visualization. - Modifying Backends (Loki, Prometheus, Tempo) Although **Loki**, **Prometheus**, and **Tempo** are the default backends, other systems for logging, metrics, and tracing can be used. No code modification will be required, but the configuration files in the `lomas/server/configs/observability/` directory need to be modified to integrate new backends. For example, **Loki** can be replaced with **Elasticsearch** for logs or a different metrics exporter could be used. Summary ------- This setup provides a robust observability pipeline for the FastAPI application, integrating logs, metrics, and traces into a centralized system for monitoring and debugging. Using **Grafana**, users can query and visualize data from **Loki**, **Tempo**, and **Prometheus**. Key Points: - Logs, traces, and metrics are exported using OpenTelemetry. - **Loki** handles logs. - **Tempo** manages traces. - **Prometheus** collects time-series metrics. - **Grafana** provides a unified view and querying capability for these data sources. - Configuration for observability can be customized in the `lomas/server/configs/observability/` directory. - Dashboard configurations are exported as JSON and placed in `lomas/server/configs/observability/grafana/example_dashboard_config.json`.