Skip to main content
← Back to index page

0.1.7

Introducing MCP Support, enabling seamless integration with Model Context Protocol servers. Connect your AI models to external tools, APIs, databases, and services through HTTP, SSE, or WebSocket transports. vLLora automatically discovers MCP tools, executes tool calls on your behalf, and traces all interactions—making it easy to extend your models with dynamic capabilities.

MCP Configuration in Settings

Other improvements in this release

  • Full MCP server support with HTTP, SSE, and WebSocket transports, enabling dynamic tool execution and external system integration
  • Embedding model support for Bedrock, Gemini, and OpenAI with comprehensive tracing and cost tracking
  • Enhanced routing with conditional strategies, fallbacks, and maximum depth limits for complex request flows
  • Improved cost tracking with cached input token pricing and enhanced usage monitoring across all providers
  • Response caching improvements for better performance and cost optimization
  • Thread handling with service integration and middleware support for conversation management
  • Multi-tenant OpenTelemetry tracing with tenant-aware span management
  • Claude Sonnet 4.5 model support
  • Virtual model versioning via model@version syntax for flexible model selection
  • Variables support in chat completions for dynamic prompt templating
  • Enhanced model metadata with service level, release date, license, and knowledge cutoff information
  • Google Vertex AI model fetching and integration
  • Improved span management with RunSpanBuffer for efficient trace processing

Features

  • feat: Add JSON configuration with 8MB limit and clean up unused imports in http.rs (@karolisg) (4689128)
  • feat: Introduce RunSpanBuffer for efficient span management and update trace service to utilize it (@karolisg) (483148c)
  • feat: Update provider credentials handling to include GatewayTenant in update_provider_key function (@karolisg) (60e4fee)
  • feat: Introduce thread handling module with service and middleware integration (@karolisg) (a5a9ccb)
  • feat: add option to control UI opening on startup and update configuration handling (@karolisg) (e5ecdf4)
  • #175 Feat: Tracing MCP server (@karolisg)
  • feat: Add Slack notification for new brew releases (@karolisg) (a1632b2)
  • feat: add Slack notification job to GitHub Actions workflow for release updates (@karolisg) (cf7b148)
  • feat: Support claude sonnet 4.5 (@karolisg) (931dd56)
  • feat: add cost field to span model for enhanced tracking capabilities (@karolisg) (64f3a7b)
  • feat: add message_id field to span model and track elapsed time for processing streams across models (@karolisg) (d3dda6d)
  • feat: add model and inference model names to tracing fields in TracedEmbedding (@karolisg) (6c096ca)
  • feat: add LLMStartEvent logging for Bedrock, Gemini, and OpenAI embedding models (@karolisg) (f24fea3)
  • feat: capture spans for Bedrock, Gemini, and OpenAI embeding models (@karolisg) (7197ae6)
  • feat: Replace custom error handling with specific ModelError variants (@karolisg) (7839211)
  • feat: Support base64 encoding in embeddings (@karolisg) (def4f90)
  • feat: Add methods for token pricing in ModelPrice enum (@karolisg) (631664a)
  • feat: Enhance OpenAI embeddings support with Azure integration and improve error handling (@karolisg) (16b401d)
  • feat: Add Bedrock embeddings support and enhance error handling (@karolisg) (4d7858b)
  • feat: Introduce Gemini embeddings model and enhance provider error handling (@karolisg) (785371a)
  • feat: Add is_private field to model metadata for enhanced privacy control (@karolisg) (217ee73)
  • feat: Add new embedding models and enhance model handling (@karolisg) (9ffa27a)
  • #129 feat: Fetch models from Google Vertex (@karolisg)
  • feat: Add async method to retrieve top model metadata by ranking (@karolisg) (6f45d3d)
  • feat: Add model metadata support to chat completion execution (@karolisg) (e4d1b8f)
  • feat: Add support for roles in ClickhouseHttp URL construction (@karolisg) (dabdfdd)
  • feat: Extend ChatCompletionMessage struct to include optional fields for tool calls, refusal, tool call ID, and cache control (@karolisg) (529d4ea)
  • feat: Add build_response method to AnthropicModel for constructing MessagesResponseBody from stream data (@karolisg) (1c42ef5)
  • feat: Enhance GenerateContentResponse structure to include model_version and response_id, (@karolisg) (3d6d572)
  • feat: Implement build_response method to construct CreateChatCompletionResponse from stream data for tracing purpose (@karolisg) (e305a0b)
  • feat: Extend ModelMetadata with new fields for service level, release date, license, and knowledge cutoff date (@karolisg) (797aff5)
  • feat: Add serde alias for InterceptorType Guardrail to support legacy "guard" identifier (@karolisg) (9dc6c43)
  • #116 feat: Implement conditional routing strategy (@karolisg)
  • feat: Integrate cache control logic into message content handling in MessageMapper (@karolisg) (416cb1d)
  • feat: Update langdb_clust to version 0.9.4 and enhance token usage tracking in cost calculations (@karolisg) (489a4a9)
  • feat: Add benchmark_info field to ModelMetadata (@karolisg) (1b7904f)
  • feat: Introduce CacheControl struct and integrate it into message mapping for content types (@karolisg) (cc715da)
  • feat: Add support for cached input token pricing in cost calculations and update related structures (@karolisg) (f1d4077)
  • feat: Return template directly if no variables are provided in render function (@karolisg) (6adaf12)
  • feat: Enhance logging by recording request payloads in Gemini client (@karolisg) (86a9875)
  • feat: Add API_CALLS_BY_IP constant for enhanced rate limiting functionality (@karolisg) (5e5958c)
  • feat: Add optional user_email field to RequestUser struct (@karolisg) (982c240)
  • feat: Implement maximum depth limit for request routing in RoutedExecutor (@karolisg) (357ffca)
  • feat: Handle max retries in request (@karolisg) (31d9d41)
  • feat: add custom event for model events (@karolisg) (7299233)
  • feat: Support project traces channels (@karolisg) (ce0efef)
  • feat: add run lifecycle events and fix model usage tracking (@karolisg) (31875af)
  • feat: implement tenant-aware OpenTelemetry trace (@karolisg) (ee48ae3)
  • #92 feat: Basic responses support (@karolisg)
  • #86 feat: Support http streamable transport (@karolisg)
  • feat: Add options struct for prompt caching (@karolisg) (61181bc)
  • feat: add description and keywords fields to thread (@karolisg) (11411ec)
  • feat: Add key generation for transport type (@karolisg) (3238632)
  • feat: add version support for virtual model retrieval via model@version syntax (@karolisg) (b1f4045)
  • #73 feat: Add variables field to chat completions (@karolisg)
  • #72 feat: Enhanced support for MCP servers (@karolisg)
  • feat: Support azure url parsing and usage in client (@karolisg) (cb4c665)
  • feat: Store tools results in spans (@karolisg) (4f8deff)
  • feat: Store openai partner moderations guard metadata (@karolisg) (0415257)
  • feat:Support openai moderation guardrails (@karolisg) (0285528)
  • feat: Return 446 error on guard rejection (@karolisg) (e0dd668)
  • #54 feat: Support custom endpoint for openai client (@karolisg)
  • #46 feat: Implement guardrails system (@VG)
  • feat: Support multiple identifiers in cost control (@karolisg) (302a84c)
  • feat: Support all anthropic properties (@karolisg) (59dec4b)
  • feat: Add support of anthropic thinking (@karolisg) (ff9815a)
  • feat: Add extra to request (@karolisg) (a9d3c32)
  • #29 feat: Support search in memory mcp tool (@VG)
  • #28 feat: Use time windows for metrics (@karolisg)
  • feat: Refactor targets usage for percentage router (@karolisg) (0445101)
  • #21 feat: Support langdb key (@karolisg)
  • #20 feat: Integrate routed execution with fallbacks (@karolisg)
  • feat: Add missing gemini parameters (@karolisg) (5f1d15e)
  • #15 feat: Improve UI (@karolisg)
  • feat: Add model name and provider name to embeddings API (@karolisg) (452cb91)
  • feat: Print provider and model name in logs (@karolisg) (3403968)
  • #4 feat: Implement tui (@VG)
  • #3 feat: Build for ubuntu and docker images (@VG)
  • feat: Support .env variables for config (@karolisg) (65561e9)
  • feat: Use in memory storage (@karolisg) (75bf2a1)
  • feat: implement mcp support (@VG) (a97bc68)
  • feat: Add rate limiting (@karolisg) (7066d87)
  • feat: Add cost control and limit checker (@karolisg) (1300266)
  • feat: Use user in openai requests (@karolisg) (96213dc)
  • feat: Add api_invoke spans (@karolisg) (329ace4)
  • feat: Enable otel when clickhouse config provided (@karolisg) (edbd16c)
  • feat: Add database span writter (@karolisg) (72bd326)
  • feat: Add clickhouse dependency (@karolisg) (78fe8ca)

Bug Fixes

Code Refactoring

  • refactor: Remove mcp_server module and associated TavilySearch implementation (@karolisg) (ae7dbb1)
  • refactor: Remove unnecessary logging in thread service middleware (@karolisg) (f775dc9)
  • refactor: Migrate codebase to vllora functionality (@karolisg) (78575b3)
  • refactor: Enhance embedding handling and introduce new model structure (@karolisg) (013b214)
  • refactor: Remove unused EngineType variants from the engine module (@karolisg) (69e12a7)
  • refactor: Clean up code formatting and remove unused dependency (@karolisg) (7b19a1b)
  • refactor: Update model handling and enhance Azure OpenAI integration (@karolisg) (868ffef)
  • refactor: Consolidate CredentialsIdent usage across modules and enhance cost calculation (@karolisg) (8cd96e8)
  • refactor: Integrate price and credentials identification into model handling (@karolisg) (88fbae0)
  • refactor: Update ModelIOFormats enum to include PartialEq derive and remove Bedrock model file (@karolisg) (65152c3)
  • refactor: Introduce BedrockCredentials type and update AWS credential handling (@karolisg) (fa28214)
  • refactor: Enhance metric routing to include default metrics for missing models (@karolisg) (54c7e2c)
  • refactor: Update default context size in Bedrock model provider to zero (@karolisg) (1211217)
  • refactor: Enhance Bedrock model ID formatting with region prefix (@karolisg) (8f80b92)
  • refactor: Update Anthropic model to handle optional system messages (@karolisg) (b3859a8)
  • refactor: Simplify message sending in stream_chunks function (@karolisg) (00b657e)
  • refactor: Enhance Bedrock model ID handling with version replacement (@karolisg) (ec5ab40)
  • refactor: Update Bedrock model provider to skip specific model ARNs and use model ARN for metadata (@karolisg) (d4034c1)
  • refactor: Simplify Bedrock model ID handling and remove unused model ARN assignment (@karolisg) (968ba0a)
  • refactor: Improve AWS region configuration handling in get_user_shared_config (@karolisg) (36c62c9)
  • refactor: Add warning logs for Bedrock model name during conversation (@karolisg) (c4a7ed0)
  • refactor: Remove debug logging for Bedrock client credentials (@karolisg) (5991121)
  • refactor: Update BedrockModel to utilize ChatCompletionMessageWithFinishReason and adjust region configuration (@karolisg) (b7fb560)
  • refactor: Remove unnecessary error logging in GeminiModel response handling (@karolisg) (343df18)
  • refactor: Enhance OpenAIModel response handling with finish reason and usage tracking (@karolisg) (91aaf32)
  • refactor: Update ChatCompletionMessage to include finish reason and adjust related model implementations (@karolisg) (f884c47)
  • refactor: Simplify match expression for ModelError in GatewayApiError and streamline error logging in stream_chunks (@karolisg) (68deec6)
  • refactor: Remove redundant logging of system messages in AnthropicModel (@karolisg) (f6d3a73)
  • refactor: Simplify ProjectTraceMap type by removing receiver from tuple (@karolisg) (ea1c088)
  • refactor: Add alias for InMemory transport type in McpTransportType enum (@karolisg) (04710a1)
  • refactor: Clean up unused app_data references in ApiServer and adjust telemetry imports (@karolisg) (a40a683)
  • refactor: Remove unused TraceMap references from RoutedExecutor and related modules (@karolisg) (2fe3067)
  • refactor: integrate InMemoryMetricsRepository into routing logic (@karolisg) (329e153)
  • refactor: split chat completion streaming into separate chunks for delta, finish reason and usage (@karolisg) (c67d478)
  • refactor: Fix use of variables (@karolisg) (5e5ad6b)
  • refactor: rename PromptCache to ResponseCache for better clarity and consistency (@karolisg) (bc9677f)
  • refactor: move caching logic to dedicated cache module and update response types (@karolisg) (708797e)