Image Generation with Responses API

This guide demonstrates how to build an AI-powered application that combines web search and image generation capabilities using the Responses API.

Overview

The Responses API enables you to create multi-tool workflows that combine different capabilities. In this example, we'll:

Use web search to find current information
Generate an image based on that information
Process and save the generated image

Prerequisites

Required Dependencies

Add these dependencies to your Cargo.toml:

[dependencies]
vllora_llm = "0.1.17"
tokio = { version = "1", features = ["full"] }
serde_json = "1.0"
base64 = "0.22"

Environment Setup

Set your API key as an environment variable:

export VLLORA_OPENAI_API_KEY="your-api-key-here"

Note: Make sure to keep your API key secure. Never commit it to version control or expose it in client-side code.

Building the Request

Creating the CreateResponse Structure

We'll create a request that uses both web search and image generation tools:

use vllora_llm::async_openai::types::responses::CreateResponse;
use vllora_llm::async_openai::types::responses::ImageGenTool;
use vllora_llm::async_openai::types::responses::InputParam;
use vllora_llm::async_openai::types::responses::Tool;
use vllora_llm::async_openai::types::responses::WebSearchTool;

let responses_req = CreateResponse {
    model: Some("gpt-4.1".to_string()),
    input: InputParam::Text(
        "Search for the latest news from today and generate an image about it".to_string(),
    ),
    tools: Some(vec![
        Tool::WebSearch(WebSearchTool::default()),
        Tool::ImageGeneration(ImageGenTool::default()),
    ]),
    ..Default::default()
};

Understanding the Components

Model Selection - We're using "gpt-4.1", which supports the Responses API and tool calling. Make sure to use a model that supports these features.

Input Parameter - We use InputParam::Text to provide a simple text prompt. The model will:

First use the web search tool to find current news
Then use the image generation tool to create an image related to that news

Tool Configuration - We specify two tools:

WebSearchTool::default() - Uses default web search configuration
ImageGenTool::default() - Uses default image generation settings

Initializing the Client

Set up the Vllora LLM client with your credentials:

use vllora_llm::client::VlloraLLMClient;
use vllora_llm::types::credentials::ApiKeyCredentials;
use vllora_llm::types::credentials::Credentials;

let client = VlloraLLMClient::default()
    .with_credentials(Credentials::ApiKey(ApiKeyCredentials {
        api_key: std::env::var("VLLORA_OPENAI_API_KEY")
            .expect("VLLORA_OPENAI_API_KEY must be set"),
    }));

Processing Text Messages

Extract and display text content from the response:

use vllora_llm::async_openai::types::responses::OutputItem;
use vllora_llm::async_openai::types::responses::OutputMessageContent;

for output in &response.output {
    match output {
        OutputItem::Message(message) => {
            for content in &message.content {
                match content {
                    OutputMessageContent::OutputText(text_output) => {
                        // Print the text content
                        println!("\n{}", text_output.text);

                        // Print sources/annotations if available
                        if !text_output.annotations.is_empty() {
                            println!("Annotations: {:#?}", text_output.annotations);
                        }
                    }
                    _ => {
                        println!("Other content type: {:?}", content);
                    }
                }
            }
        }
        // ... handle other output types
    }
}

Annotations - Text outputs can include annotations which provide:

Citations and sources (especially useful with web search)
References to tool calls
Additional metadata

Handling Image Generation Results

When the model uses the image generation tool, the response includes OutputItem::ImageGenerationCall variants. Each call contains:

A result field with the base64-encoded image data
Metadata about the generation

Decoding and Saving Images

Here's a complete function to decode and save generated images:

use vllora_llm::async_openai::types::responses::ImageGenToolCall;
use base64::{engine::general_purpose::STANDARD, Engine as _};
use std::fs;

/// Decodes a base64-encoded image from an ImageGenerationCall and saves it to a file.
///
/// # Arguments
/// * `image_generation_call` - The image generation call containing the base64-encoded image
/// * `index` - The index to use in the filename
///
/// # Returns
/// * `Ok(filename)` - The filename where the image was saved
/// * `Err(e)` - An error if the call has no result, decoding fails, or file writing fails
fn decode_and_save_image(
    image_generation_call: &ImageGenToolCall,
    index: usize,
) -> Result<String, Box<dyn std::error::Error>> {
    // Extract base64 image from the call
    let base64_image = image_generation_call
        .result
        .as_ref()
        .ok_or("Image generation call has no result")?;

    // Decode base64 image
    let image_data = STANDARD.decode(base64_image)?;

    // Save to file
    let filename = format!("generated_image_{}.png", index);
    fs::write(&filename, image_data)?;

    Ok(filename)
}

Step-by-Step Breakdown

Extract Base64 Data - We access the result field, which is an Option<String>. We use .ok_or() to convert None into an error if the result is missing.
Decode Base64 - The base64 crate's STANDARD engine decodes the base64 string into raw bytes. This can fail if the string is malformed, so we use ? to propagate errors.
Save to File - We use Rust's standard library fs::write() to save the decoded bytes to a file. We name it generated_image_{index}.png to avoid conflicts when multiple images are generated.

Complete Example

Here's the complete working example that puts it all together:

use vllora_llm::async_openai::types::responses::CreateResponse;
use vllora_llm::async_openai::types::responses::ImageGenTool;
use vllora_llm::async_openai::types::responses::ImageGenToolCall;
use vllora_llm::async_openai::types::responses::InputParam;
use vllora_llm::async_openai::types::responses::OutputItem;
use vllora_llm::async_openai::types::responses::OutputMessageContent;
use vllora_llm::async_openai::types::responses::Tool;
use vllora_llm::async_openai::types::responses::WebSearchTool;

use base64::{engine::general_purpose::STANDARD, Engine as _};
use std::fs;

use vllora_llm::client::VlloraLLMClient;
use vllora_llm::error::LLMResult;
use vllora_llm::types::credentials::ApiKeyCredentials;
use vllora_llm::types::credentials::Credentials;

fn decode_and_save_image(
    image_generation_call: &ImageGenToolCall,
    index: usize,
) -> Result<String, Box<dyn std::error::Error>> {
    let base64_image = image_generation_call
        .result
        .as_ref()
        .ok_or("Image generation call has no result")?;

    let image_data = STANDARD.decode(base64_image)?;
    let filename = format!("generated_image_{}.png", index);
    fs::write(&filename, image_data)?;

    Ok(filename)
}

#[tokio::main]
async fn main() -> LLMResult<()> {
    // 1) Build a Responses-style request using async-openai-compat types
    // with tools for web_search_preview and image_generation
    let responses_req = CreateResponse {
        model: Some("gpt-4.1".to_string()),
        input: InputParam::Text(
            "Search for the latest news from today and generate an image about it".to_string(),
        ),
        tools: Some(vec![
            Tool::WebSearch(WebSearchTool::default()),
            Tool::ImageGeneration(ImageGenTool::default()),
        ]),
        ..Default::default()
    };

    // 2) Construct a VlloraLLMClient
    let client =
        VlloraLLMClient::default().with_credentials(Credentials::ApiKey(ApiKeyCredentials {
            api_key: std::env::var("VLLORA_OPENAI_API_KEY")
                .expect("VLLORA_OPENAI_API_KEY must be set"),
        }));

    // 3) Non-streaming: send the request and print the final reply
    println!("Sending request with tools: web_search_preview and image_generation");
    let response = client.responses().create(responses_req).await?;

    println!("\nNon-streaming reply:");
    println!("{}", "=".repeat(80));

    for (index, output) in response.output.iter().enumerate() {
        match output {
            OutputItem::ImageGenerationCall(image_generation_call) => {
                println!("\n[Image Generation Call {}]", index);
                match decode_and_save_image(image_generation_call, index) {
                    Ok(filename) => {
                        println!("✓ Successfully saved image to: {}", filename);
                    }
                    Err(e) => {
                        eprintln!("✗ Failed to decode/save image: {}", e);
                    }
                }
            }
            OutputItem::Message(message) => {
                println!("\n[Message {}]", index);
                println!("{}", "-".repeat(80));

                for content in &message.content {
                    match content {
                        OutputMessageContent::OutputText(text_output) => {
                            // Print the text content
                            println!("\n{}", text_output.text);

                            // Print sources/annotations if available
                            if !text_output.annotations.is_empty() {
                                println!("Annotations: {:#?}", text_output.annotations);
                            }
                        }
                        _ => {
                            println!("Other content type: {:?}", content);
                        }
                    }
                }
                println!("\n{}", "=".repeat(80));
            }
            _ => {
                println!("\n[Other Output {}]", index);
                println!("{:?}", output);
            }
        }
    }

    Ok(())
}

Expected Output

When you run this example, you'll see output like:

Sending request with tools: web_search_preview and image_generation

Non-streaming reply:
================================================================================

[Message 0]
--------------------------------------------------------------------------------

Here's the latest news from today: [summary of current news]

Annotations: [citations and sources from web search]

================================================================================

[Image Generation Call 1]
✓ Successfully saved image to: generated_image_1.png

The actual news content and image will vary based on what's happening when you run it!

Execution Flow

Request Construction - We build a CreateResponse with our prompt and tools
Client Initialization - We create and configure the Vllora LLM client
API Call - We send the request and await the response
Response Processing - We iterate through output items:
- Handle image generation calls by decoding and saving
- Display text messages with annotations
- Handle any other output types
File Output - Generated images are saved to disk as PNG files

Summary

This example demonstrates how to use the Responses API to create multi-tool workflows that combine web search and image generation. The key steps are:

Build a CreateResponse request with the desired tools (WebSearchTool and ImageGenTool)
Initialize the VlloraLLMClient with your API credentials
Send the request and receive structured outputs
Process different output types: extract text from OutputItem::Message and decode base64 images from OutputItem::ImageGenerationCall
Save decoded images to disk using standard Rust file I/O

The Responses API enables powerful, structured workflows that go beyond simple text completions, making it ideal for building applications that need to orchestrate multiple AI capabilities.

Overview​

Prerequisites​

Required Dependencies​

Environment Setup​

Building the Request​

Creating the CreateResponse Structure​

Understanding the Components​

Initializing the Client​

Processing Text Messages​

Handling Image Generation Results​

Decoding and Saving Images​

Step-by-Step Breakdown​

Complete Example​

Expected Output​

Execution Flow​

Summary​