OpenAI image generation support #58

amitsnow · 2025-10-29T05:29:41Z

Summary

Adds complete text-to-image generation and image editing support for DALL-E-2, DALL-E-3, and GPT-Image-1 models with unified output_type: image interface, auto-detection of operation type, multi-image support, and automatic file saving.

Features Implemented

Text-to-Image Generation:

DALL-E-2: Batch generation (up to 10 images), sizes 256x256-1024x1024
DALL-E-3/GPT-Image-1: HD quality, vivid/natural styles, sizes up to 1792x1024

Image Editing:

DALL-E-2: Single image editing
GPT-Image-1: Multi-image editing (1-16 images)

Output Processing:

Returns base64 data URLs (single string or JSON array for n>1)
Auto-saves to multimodal_output/image/ directory
Replaces data URLs with file paths in output

Performance Impact

Single-pass processing, lazy directory creation
Negligible overhead (I/O bound by API calls)

How to Test

Basic Testing:

Text-to-image: Set output_type: image, provide text prompts
Batch generation: Set n: 3 in parameters (DALL-E-2)
Image editing: Provide images in prompt with text instructions
Multi-image editing: Use GPT-Image-1 with 2-16 images

Unit Tests:

pytest tests/core/models/test_custom_openai.py -k "image" -v  # 16 tests
pytest tests/utils/test_multimodal_processor.py -v             # 10 tests

Configuration Example:

gpt_image_1:
  model: gpt-image-1
  output_type: image 
  model_type: azure_openai  
  api_version: 2025-04-01-preview
  parameters:
    size: "1024x1024" 
    quality: "high"

Example Output

Single image (n=1)
{"id": "1", "image": "multimodal_output/image/1_image_0.png"}

Multiple images (n=3)
{"id": "1", "images": ["...image/1_images_0.png", "...1_images_1.png", "...1_images_2.png"]}

Checklist

Lint fixes and unit testing done
End to end task testing
Documentation updated

…utput

docs/concepts/multimodal/image_generation.md

psriramsnc

LGTM 🚀

amitsnow added 10 commits October 29, 2025 10:58

OpenAI image generation support

cae9c0d

Consolidate and clean image generate and edit implementation

ef2253b

Creating output directory only when encountering multimodal data in o…

05761f3

…utput

Adding tests for multimodal_processor.py

f22180c

Adding missing multimodal tests and small fixes

7fda185

Fix to handle batch image generation (when n>1)

2733b20

Image generation documentation

96ba004

gpt-image-1 models.yaml

e4db23a

linting and format fixes

e15621d

linting issue fixes

c2af5bf

amitsnow marked this pull request as ready for review October 29, 2025 19:52

amitsnow requested a review from a team as a code owner October 29, 2025 19:52

amitsnow self-assigned this Oct 29, 2025

bidyapati-p reviewed Nov 3, 2025

View reviewed changes

docs/concepts/multimodal/image_generation.md Show resolved Hide resolved

bidyapati-p approved these changes Nov 4, 2025

View reviewed changes

psriramsnc approved these changes Nov 4, 2025

View reviewed changes

vipul-mittal approved these changes Nov 4, 2025

View reviewed changes

amitsnow merged commit d9c0903 into main Nov 4, 2025
12 checks passed

amitsnow deleted the scratch/image_gen branch November 4, 2025 05:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenAI image generation support #58

OpenAI image generation support #58

Uh oh!

amitsnow commented Oct 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

psriramsnc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

OpenAI image generation support #58

OpenAI image generation support #58

Uh oh!

Conversation

amitsnow commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features Implemented

Performance Impact

How to Test

Example Output

Checklist

Checklist

Uh oh!

Uh oh!

psriramsnc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

amitsnow commented Oct 29, 2025 •

edited

Loading