The Observability Index / Tracing & Spans / #37

last9/gpu-telemetry

by last9 · Tracing & Spans · updated 10d ago

GPU telemetry with workload attribution. One OTLP agent per node ties hardware metrics (NVIDIA, AMD, Intel Gaudi) to the K8s pod or Slurm job burning the GPU.

momentum

stars

forks

#37

rank

amddcgmgpugpu-monitoringhelmintel-gaudi-base-operatorkubernetesllm-observabilitymlopsnvidianvml-monitoringopentelemetry

View on GitHub →

last9/gpu-telemetry

More in Tracing & Spans