vLLora Changelog

0.1.23

January 30, 2026

General bug fixes and improvements.

Introducing Lucy, a trace-native debugging assistant built directly into vLLora. Lucy reads your threads and traces end-to-end and tells you what went wrong, why it happened, and what to do next - without you having to manually scroll through hundreds of spans.

0.1.21

January 9, 2026

This release introduces Distri Agent Support, enabling distributed agent execution directly within vLLora. Distri agents are automatically downloaded, registered, and managed with real-time status reporting and health monitoring.

Additional improvements include enhanced Lucy configuration management for agents, OTLP metrics port configuration, project slug support across services, and run overview totals for better visibility into trace statistics.

0.1.20

December 23, 2025

vLLora now includes a CLI tool that brings trace inspection and debugging capabilities directly to your terminal. The CLI enables fast iteration, automation workflows, and local reproduction of LLM traces without leaving your terminal.

The CLI provides commands to:

Search and filter traces by status, time range, model, and operation type
Get run overviews with span trees and LLM call summaries
Inspect individual LLM call payloads and responses
Monitor system health with aggregated statistics

Learn more in the vLLora CLI documentation.

This release also introduces Custom Providers and Models, allowing you to register your own API endpoints and model identifiers. Connect to self-hosted inference engines (like Ollama or LocalAI), private enterprise gateways, or any OpenAI-compatible service using a namespaced format (provider/model-id). Configure providers and models through Settings or the Chat Model Selector.

Learn more in the Custom Providers and Models documentation.

0.1.19

December 19, 2025

vLLora now includes an MCP Server that exposes trace and run inspection as tools for coding agents. Debug, fix, and monitor your AI agents directly from your terminal or IDE by connecting Claude Desktop, Cursor, or any MCP-capable client to vLLora's MCP endpoint.

The MCP server provides tools to:

Search and filter traces by status, time range, model, and more
Get run overviews with span trees and error breadcrumbs
Inspect individual LLM call payloads and responses
Monitor system health with aggregated statistics

Learn more in the MCP Server documentation.

0.1.18

December 15, 2025

vLLora now supports Custom Endpoints, allowing you to connect your own API endpoints to any provider. Simply provide your endpoint URL and API key through the Provider Keys UI, and vLLora will route requests to your custom endpoint instead of the default provider endpoint.

Custom Endpoint UI

This feature enables you to:

Use custom API gateways and proxies
Connect to self-hosted models
Route through OpenAI-compatible endpoints

Learn more in the Custom Endpoints documentation.

0.1.17

December 12, 2025

Added support for the responses API for vllora_llm (see responses API docs) and related updates.

0.1.16

December 11, 2025

Fixed bugs improving UI stability, cost calculations, and API endpoint behavior. Key improvements include fixes for debug mode span display, visual diagram interactions, and pagination handling.

0.1.15

December 10, 2025

Introducing Debug Mode, an interactive debugging feature that lets you pause LLM requests before they're sent to the model. With debug mode enabled, you can inspect the full request payload, edit messages, parameters, and tool schemas in real time, then continue execution with your modifications—all without changing your application code.

Debug Mode in action

Debug mode is perfect for debugging agent prompts, verifying model selection, inspecting tool schemas, and tuning parameters on the fly. Simply enable the breakpoint toggle in the Traces view, and every outgoing LLM request will pause for inspection and editing.

With debug mode you can:

Inspect the model, messages, parameters, and tool schemas
Continue with the original request
Modify the request and send your edited version instead

Learn more in the Debug Mode documentation.

0.1.14

December 4, 2025

Introducing vllora_llm crate, a standalone Rust library that provides a unified interface for interacting with multiple LLM providers through the vLLora AI Gateway. The crate enables seamless chat completions across OpenAI-compatible, Anthropic, Gemini, and Bedrock providers, with built-in streaming support and telemetry integration.

use vllora_llm::client::VlloraLLMClient;
use vllora_llm::types::gateway::{ChatCompletionRequest, ChatCompletionMessage};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let request = ChatCompletionRequest {
        model: "gpt-4.1-mini".to_string(),
        messages: vec![
            ChatCompletionMessage::new_text("user".to_string(), "Say hello!".to_string()),
        ],
        ..Default::default()
    };

    let client = VlloraLLMClient::new();
    let response = client.completions().create(request).await?;
    Ok(())
}

Other improvements in this release

Enhanced breakpoint management with GlobalBreakpointStateEvent integration
Improved error handling for multiple provider scenarios
Asynchronous improvements to intercept functionality

0.1.13

December 2, 2025

General bug fixes and improvements.

0.1.12

December 2, 2025

Introducing Clone Request & Experiments, a powerful new feature that enables you to A/B test prompts, compare models, and iterate on LLM requests directly from your traces. Clone any finished trace into an isolated experiment where you can safely tweak parameters, switch models, or modify prompts without affecting the original request.

The Experiment feature provides two editing modes: a Visual Editor for intuitive prompt tweaking and a JSON Editor for precise parameter control. Edit system and user messages, switch models on the fly, adjust temperature and other parameters, and run experiments with side-by-side comparison of tokens, costs, and outputs—all without ever touching your original trace.

Experimenting with visual editor

Perfect for prompt engineering, model comparison, parameter tuning, and iterative debugging. Learn more in the Clone and Experiment documentation.

Other improvements in this release:

Enhanced breakpoint management with global intercept and resume capabilities
Improved tracing with better parent span ID handling and status recording
Stream implementation refactoring for better LLM provider support
Better output recording and model finish reason handling

0.1.11

November 26, 2025

Enhanced error handling for port conflicts during startup.
Stream processing improvements with better tracing instrumentation for streaming responses.

0.1.10

November 24, 2025

Added thought signature support for Gemini 3 Pro models.
General bug fixes and improvements.

0.1.9

November 21, 2025

General bug fixes and improvements.

0.1.8

November 19, 2025

General bug fixes and improvements.

0.1.7

November 18, 2025

Introducing MCP Support, enabling seamless integration with Model Context Protocol servers. Connect your AI models to external tools, APIs, databases, and services through HTTP, SSE, or WebSocket transports. vLLora automatically discovers MCP tools, executes tool calls on your behalf, and traces all interactions—making it easy to extend your models with dynamic capabilities.

MCP Configuration in Settings

Other improvements in this release

Full MCP server support with HTTP, SSE, and WebSocket transports, enabling dynamic tool execution and external system integration
Embedding model support for Bedrock, Gemini, and OpenAI with comprehensive tracing and cost tracking
Enhanced routing with conditional strategies, fallbacks, and maximum depth limits for complex request flows
Improved cost tracking with cached input token pricing and enhanced usage monitoring across all providers
Response caching improvements for better performance and cost optimization
Thread handling with service integration and middleware support for conversation management
Multi-tenant OpenTelemetry tracing with tenant-aware span management
Claude Sonnet 4.5 model support
Virtual model versioning via model@version syntax for flexible model selection
Variables support in chat completions for dynamic prompt templating
Enhanced model metadata with service level, release date, license, and knowledge cutoff information
Google Vertex AI model fetching and integration
Improved span management with RunSpanBuffer for efficient trace processing

0.1.6

November 4, 2025

General bug fixes and improvements.

0.1.5

November 4, 2025

General bug fixes and improvements.

0.1.4

November 4, 2025

General bug fixes and improvements.

Other improvements in this release​

Other improvements in this release