When developing your LLM application, it can be helpful to keep track of production data such as:
Pipeline performance (latency/token count/throughput of various stages)
Resource usage (LLM/Embedding Inference Cost, CPU/GPU utilization)
Evaluation metrics (accuracy, precision, recall, qualitative eval and drift)
Pipeline versioning (which versions of sub-components e.g. LLM/embedding & artifacts e.g. prompts were used in the pipeline at a given time)
We will share more on how to set up monitoring in the future.