Skip to main content

Image Generation with Responses API

This guide demonstrates how to build an AI-powered application that combines web search and image generation capabilities using the Responses API.

Overview

The Responses API enables you to create multi-tool workflows that combine different capabilities. In this example, we'll:

  1. Use web search to find current information
  2. Generate an image based on that information
  3. Process and save the generated image

Prerequisites

Required Dependencies

Add these dependencies to your Cargo.toml:

[dependencies]
vllora_llm = "0.1.17"
tokio = { version = "1", features = ["full"] }
serde_json = "1.0"
base64 = "0.22"

Environment Setup

Set your API key as an environment variable:

export VLLORA_OPENAI_API_KEY="your-api-key-here"

Note: Make sure to keep your API key secure. Never commit it to version control or expose it in client-side code.

Building the Request

Creating the CreateResponse Structure

We'll create a request that uses both web search and image generation tools:

use vllora_llm::async_openai::types::responses::CreateResponse;
use vllora_llm::async_openai::types::responses::ImageGenTool;
use vllora_llm::async_openai::types::responses::InputParam;
use vllora_llm::async_openai::types::responses::Tool;
use vllora_llm::async_openai::types::responses::WebSearchTool;

let responses_req = CreateResponse {
model: Some("gpt-4.1".to_string()),
input: InputParam::Text(
"Search for the latest news from today and generate an image about it".to_string(),
),
tools: Some(vec![
Tool::WebSearch(WebSearchTool::default()),
Tool::ImageGeneration(ImageGenTool::default()),
]),
..Default::default()
};

Understanding the Components

Model Selection - We're using "gpt-4.1", which supports the Responses API and tool calling. Make sure to use a model that supports these features.

Input Parameter - We use InputParam::Text to provide a simple text prompt. The model will:

  1. First use the web search tool to find current news
  2. Then use the image generation tool to create an image related to that news

Tool Configuration - We specify two tools:

  • WebSearchTool::default() - Uses default web search configuration
  • ImageGenTool::default() - Uses default image generation settings

Initializing the Client

Set up the Vllora LLM client with your credentials:

use vllora_llm::client::VlloraLLMClient;
use vllora_llm::types::credentials::ApiKeyCredentials;
use vllora_llm::types::credentials::Credentials;

let client = VlloraLLMClient::default()
.with_credentials(Credentials::ApiKey(ApiKeyCredentials {
api_key: std::env::var("VLLORA_OPENAI_API_KEY")
.expect("VLLORA_OPENAI_API_KEY must be set"),
}));

Processing Text Messages

Extract and display text content from the response:

use vllora_llm::async_openai::types::responses::OutputItem;
use vllora_llm::async_openai::types::responses::OutputMessageContent;

for output in &response.output {
match output {
OutputItem::Message(message) => {
for content in &message.content {
match content {
OutputMessageContent::OutputText(text_output) => {
// Print the text content
println!("\n{}", text_output.text);

// Print sources/annotations if available
if !text_output.annotations.is_empty() {
println!("Annotations: {:#?}", text_output.annotations);
}
}
_ => {
println!("Other content type: {:?}", content);
}
}
}
}
// ... handle other output types
}
}

Annotations - Text outputs can include annotations which provide:

  • Citations and sources (especially useful with web search)
  • References to tool calls
  • Additional metadata

Handling Image Generation Results

When the model uses the image generation tool, the response includes OutputItem::ImageGenerationCall variants. Each call contains:

  • A result field with the base64-encoded image data
  • Metadata about the generation

Decoding and Saving Images

Here's a complete function to decode and save generated images:

use vllora_llm::async_openai::types::responses::ImageGenToolCall;
use base64::{engine::general_purpose::STANDARD, Engine as _};
use std::fs;

/// Decodes a base64-encoded image from an ImageGenerationCall and saves it to a file.
///
/// # Arguments
/// * `image_generation_call` - The image generation call containing the base64-encoded image
/// * `index` - The index to use in the filename
///
/// # Returns
/// * `Ok(filename)` - The filename where the image was saved
/// * `Err(e)` - An error if the call has no result, decoding fails, or file writing fails
fn decode_and_save_image(
image_generation_call: &ImageGenToolCall,
index: usize,
) -> Result<String, Box<dyn std::error::Error>> {
// Extract base64 image from the call
let base64_image = image_generation_call
.result
.as_ref()
.ok_or("Image generation call has no result")?;

// Decode base64 image
let image_data = STANDARD.decode(base64_image)?;

// Save to file
let filename = format!("generated_image_{}.png", index);
fs::write(&filename, image_data)?;

Ok(filename)
}

Step-by-Step Breakdown

  1. Extract Base64 Data - We access the result field, which is an Option<String>. We use .ok_or() to convert None into an error if the result is missing.

  2. Decode Base64 - The base64 crate's STANDARD engine decodes the base64 string into raw bytes. This can fail if the string is malformed, so we use ? to propagate errors.

  3. Save to File - We use Rust's standard library fs::write() to save the decoded bytes to a file. We name it generated_image_{index}.png to avoid conflicts when multiple images are generated.

Complete Example

Here's the complete working example that puts it all together:

use vllora_llm::async_openai::types::responses::CreateResponse;
use vllora_llm::async_openai::types::responses::ImageGenTool;
use vllora_llm::async_openai::types::responses::ImageGenToolCall;
use vllora_llm::async_openai::types::responses::InputParam;
use vllora_llm::async_openai::types::responses::OutputItem;
use vllora_llm::async_openai::types::responses::OutputMessageContent;
use vllora_llm::async_openai::types::responses::Tool;
use vllora_llm::async_openai::types::responses::WebSearchTool;

use base64::{engine::general_purpose::STANDARD, Engine as _};
use std::fs;

use vllora_llm::client::VlloraLLMClient;
use vllora_llm::error::LLMResult;
use vllora_llm::types::credentials::ApiKeyCredentials;
use vllora_llm::types::credentials::Credentials;

fn decode_and_save_image(
image_generation_call: &ImageGenToolCall,
index: usize,
) -> Result<String, Box<dyn std::error::Error>> {
let base64_image = image_generation_call
.result
.as_ref()
.ok_or("Image generation call has no result")?;

let image_data = STANDARD.decode(base64_image)?;
let filename = format!("generated_image_{}.png", index);
fs::write(&filename, image_data)?;

Ok(filename)
}

#[tokio::main]
async fn main() -> LLMResult<()> {
// 1) Build a Responses-style request using async-openai-compat types
// with tools for web_search_preview and image_generation
let responses_req = CreateResponse {
model: Some("gpt-4.1".to_string()),
input: InputParam::Text(
"Search for the latest news from today and generate an image about it".to_string(),
),
tools: Some(vec![
Tool::WebSearch(WebSearchTool::default()),
Tool::ImageGeneration(ImageGenTool::default()),
]),
..Default::default()
};

// 2) Construct a VlloraLLMClient
let client =
VlloraLLMClient::default().with_credentials(Credentials::ApiKey(ApiKeyCredentials {
api_key: std::env::var("VLLORA_OPENAI_API_KEY")
.expect("VLLORA_OPENAI_API_KEY must be set"),
}));

// 3) Non-streaming: send the request and print the final reply
println!("Sending request with tools: web_search_preview and image_generation");
let response = client.responses().create(responses_req).await?;

println!("\nNon-streaming reply:");
println!("{}", "=".repeat(80));

for (index, output) in response.output.iter().enumerate() {
match output {
OutputItem::ImageGenerationCall(image_generation_call) => {
println!("\n[Image Generation Call {}]", index);
match decode_and_save_image(image_generation_call, index) {
Ok(filename) => {
println!("✓ Successfully saved image to: {}", filename);
}
Err(e) => {
eprintln!("✗ Failed to decode/save image: {}", e);
}
}
}
OutputItem::Message(message) => {
println!("\n[Message {}]", index);
println!("{}", "-".repeat(80));

for content in &message.content {
match content {
OutputMessageContent::OutputText(text_output) => {
// Print the text content
println!("\n{}", text_output.text);

// Print sources/annotations if available
if !text_output.annotations.is_empty() {
println!("Annotations: {:#?}", text_output.annotations);
}
}
_ => {
println!("Other content type: {:?}", content);
}
}
}
println!("\n{}", "=".repeat(80));
}
_ => {
println!("\n[Other Output {}]", index);
println!("{:?}", output);
}
}
}

Ok(())
}

Expected Output

When you run this example, you'll see output like:

Sending request with tools: web_search_preview and image_generation

Non-streaming reply:
================================================================================

[Message 0]
--------------------------------------------------------------------------------

Here's the latest news from today: [summary of current news]

Annotations: [citations and sources from web search]

================================================================================

[Image Generation Call 1]
✓ Successfully saved image to: generated_image_1.png

The actual news content and image will vary based on what's happening when you run it!

Execution Flow

  1. Request Construction - We build a CreateResponse with our prompt and tools
  2. Client Initialization - We create and configure the Vllora LLM client
  3. API Call - We send the request and await the response
  4. Response Processing - We iterate through output items:
    • Handle image generation calls by decoding and saving
    • Display text messages with annotations
    • Handle any other output types
  5. File Output - Generated images are saved to disk as PNG files

Summary

This example demonstrates how to use the Responses API to create multi-tool workflows that combine web search and image generation. The key steps are:

  1. Build a CreateResponse request with the desired tools (WebSearchTool and ImageGenTool)
  2. Initialize the VlloraLLMClient with your API credentials
  3. Send the request and receive structured outputs
  4. Process different output types: extract text from OutputItem::Message and decode base64 images from OutputItem::ImageGenerationCall
  5. Save decoded images to disk using standard Rust file I/O

The Responses API enables powerful, structured workflows that go beyond simple text completions, making it ideal for building applications that need to orchestrate multiple AI capabilities.