The Observability Index / Tracing & Spans / #37
last9/gpu-telemetry
by last9 · Tracing & Spans · updated 10d ago
GPU telemetry with workload attribution. One OTLP agent per node ties hardware metrics (NVIDIA, AMD, Intel Gaudi) to the K8s pod or Slurm job burning the GPU.
57
momentum
42
stars
3
forks
#37
rank
amddcgmgpugpu-monitoringhelmintel-gaudi-base-operatorkubernetesllm-observabilitymlopsnvidianvml-monitoringopentelemetry
View on GitHub →