Why Monitoring?

When developing your LLM application, it can be helpful to keep track of production data such as:

  • Pipeline performance (latency/token count/throughput of various stages)

  • Resource usage (LLM/Embedding Inference Cost, CPU/GPU utilization)

  • Evaluation metrics (accuracy, precision, recall, qualitative eval and drift)

  • Pipeline versioning (which versions of sub-components e.g. LLM/embedding & artifacts e.g. prompts were used in the pipeline at a given time)

We will share more on how to set up monitoring in the future.