Step 3: Monitoring AI Systems — Trace Behavior, Latency, and Cost

Monitoring in AI systems is crucial for understanding the performance and behavior of the application.

It involves tracking multiple aspects beyond user input and output, such as the models used, request latencies, and system costs. Tools like Link Views provide detailed insights, including information on each step of the process, such as tool choice, execution, and the activation of guardrails.

For effective monitoring, it is important to trace the entire flow of interactions, as opposed to just testing one input-output cycle. This is particularly important in conversational AI systems, where persistent user queries can break the system and change its role. Monitoring allows for the detection of such issues by providing a comprehensive view of each request and response, making it easier to identify potential problems.

Detailed traces help detect where the system can be optimized. For example, if an agent takes too long to respond, the trace will reveal the delay, helping engineers pinpoint where optimization is needed.

Monitoring also helps validate guardrails. If guardrails are too strict, useful requests are blocked. If they are too lenient, unsafe behavior slips through. Monitoring the full system behavior makes it possible to tune these trade-offs.

What to monitor (practical checklist)

1) Product + user experience

Success rate for user tasks (completion, escalation, fallback).
User feedback signals (thumbs up/down, retry loops, abandonment).

2) Quality + safety

Hallucination and grounding failures (e.g., missing citations, weak retrieval evidence).
Guardrail outcomes: pass, warn, block, escalate.
Policy and compliance signals (PII detected, sensitive topics, jailbreak attempts).

3) Reliability

Error rates by component (LLM API, retriever/vector DB, tools, database, network).
Timeout rate and retry rate.

4) Latency

End-to-end latency (p50, p95, p99).
Step latency (planner, retrieval, tool calls, post-processing, guardrails).