Skip to content

Conversation

@amitsnow
Copy link
Collaborator

@amitsnow amitsnow commented Oct 29, 2025

Summary

Adds complete text-to-image generation and image editing support for DALL-E-2, DALL-E-3, and GPT-Image-1 models with unified output_type: image interface, auto-detection of operation type, multi-image support, and automatic file saving.

Features Implemented

Text-to-Image Generation:

  • DALL-E-2: Batch generation (up to 10 images), sizes 256x256-1024x1024
  • DALL-E-3/GPT-Image-1: HD quality, vivid/natural styles, sizes up to 1792x1024

Image Editing:

  • DALL-E-2: Single image editing
  • GPT-Image-1: Multi-image editing (1-16 images)

Output Processing:

  • Returns base64 data URLs (single string or JSON array for n>1)
  • Auto-saves to multimodal_output/image/ directory
  • Replaces data URLs with file paths in output

Performance Impact

  • Single-pass processing, lazy directory creation
  • Negligible overhead (I/O bound by API calls)

How to Test

Basic Testing:

  1. Text-to-image: Set output_type: image, provide text prompts
  2. Batch generation: Set n: 3 in parameters (DALL-E-2)
  3. Image editing: Provide images in prompt with text instructions
  4. Multi-image editing: Use GPT-Image-1 with 2-16 images

Unit Tests:

pytest tests/core/models/test_custom_openai.py -k "image" -v  # 16 tests
pytest tests/utils/test_multimodal_processor.py -v             # 10 tests

Configuration Example:

gpt_image_1:
  model: gpt-image-1
  output_type: image 
  model_type: azure_openai  
  api_version: 2025-04-01-preview
  parameters:
    size: "1024x1024" 
    quality: "high" 

Example Output

Single image (n=1)
{"id": "1", "image": "multimodal_output/image/1_image_0.png"}

Multiple images (n=3)
{"id": "1", "images": ["...image/1_images_0.png", "...1_images_1.png", "...1_images_2.png"]}

Checklist

Checklist

  • Lint fixes and unit testing done
  • End to end task testing
  • Documentation updated

@amitsnow amitsnow marked this pull request as ready for review October 29, 2025 19:52
@amitsnow amitsnow requested a review from a team as a code owner October 29, 2025 19:52
@amitsnow amitsnow self-assigned this Oct 29, 2025
Copy link
Collaborator

@psriramsnc psriramsnc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@amitsnow amitsnow merged commit d9c0903 into main Nov 4, 2025
12 checks passed
@amitsnow amitsnow deleted the scratch/image_gen branch November 4, 2025 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants