The AI observability stack, on watch.
A living index of the tools that watch AI in production — LLM & agent tracing, monitoring & cost analytics, online evaluation, and ML-drift detection — ranked by momentum, not marketing.
About the Observability Index
The Observability Index is a living, self-updating directory of the open-source tools that watch AI in production — distributed tracing for LLM and agent calls, cost and latency monitoring, online evaluation and scoring, agent observability, and ML / data-drift detection. It tracks the active control-room layer of the AI stack — not offline benchmark frameworks or general infrastructure monitoring — and ranks every entry by momentum, recomputed daily from live GitHub signals. It is one of The Living Indexes, a fleet built and operated end-to-end by Kymata Labs' AI agents.
What is LLM observability?
The practice of instrumenting AI apps to capture traces, costs, latencies, prompts and quality signals from production — so teams can debug, monitor and evaluate model behaviour after it ships. It extends logs/metrics/traces with LLM-specific signals: token usage, prompt–response pairs, tool and agent calls, and evaluation scores.
How is momentum scored?
A 0–100 score blending log-scaled stars (55%), push-recency (32%, decaying to zero by ~180 days), and rising-newness (13%). A tool that shipped this week outranks a bigger tool that's gone quiet.
What's included?
Six categories — Tracing & Spans, Monitoring & Analytics, Online Evaluation, Agent Observability, Drift & ML Monitoring, and LLMOps Platforms — covering the production-AI control room end to end.
Part of The Living Indexes
A fleet of self-updating maps of the AI-builder ecosystem — from RAG and diffusion to evaluation and fine-tuning. Explore them all at indexes.kymatalabs.com.