Monitoring LLMs
I created an interface that allows for fully monitoring multiple language models and tracking the amount of money spent on each of them.
  • Role
    Solo designer in team with Product Owner
  • Duration
    2 weeks
  • Platform
    Web, B2B SaaS
The problem
When teams started integrating LLMs into their products, they immediately hit a visibility gap. There was no standard way to track what a model was doing, how it was performing, or what it was costing.

The challenge was designing a monitoring interface for a new category of software: one where the conventions hadn't been established yet.
Decisions
  • Cost as a first-class metric
    Unlike traditional infrastructure components, LLMs have a direct per-request cost tied to token usage. I treated cost tracking not as a secondary detail but as a primary dashboard metric alongside latency and error rates. For teams running multiple models, understanding spend per model is as important as understanding performance.
  • Two views for chat history
    Chat history serves two different needs. When an engineer is investigating a specific conversation, a dialogue view is far more readable. When they need to scan across many interactions, compare patterns, or find a specific exchange, a dense table view is faster. Both views exist, and the user chooses based on what they're trying to do.
  • Multi-model
    The dashboard was designed to monitor multiple language models simultaneously, not just one. This was a deliberate structural choice. Monitoring tool that requires switching between separate views per model creates blind spots. Comparison across models is built into the default experience.
Chat history
Chat history as table