Fast3D - Multimodal Text/Image-to-3D Generation System

A comprehensive Python system for generating 3D models from text descriptions, images, or both using advanced neural networks, geometric algorithms, and machine learning techniques.

Features

Core Capabilities

🔤 Text-to-3D: Generate 3D models from natural language descriptions using neural networks
🖼️ Image-to-3D: Convert 2D images to 3D models using depth estimation and computer vision
🔄 Multimodal Generation: Combine text and images for enhanced 3D model creation
🎨 Multiple Algorithms: Neural, procedural, and hybrid generation approaches

Technical Features

Neural Networks: CLIP text encoding, NeRF-based 3D representation, diffusion models
Computer Vision: Depth estimation, stereo vision, photogrammetry, point cloud processing
Advanced Processing: Attention-based feature fusion, semantic understanding, material properties
Export Formats: OBJ, STL, PLY, GLB with materials and textures
Quality Control: Post-processing, mesh optimization, and validation

Installation

pip install -r requirements.txt

Dependencies

Core Dependencies:

numpy >= 1.21.0
trimesh >= 3.15.0
open3d >= 0.16.0
torch >= 1.12.0
torchvision >= 0.13.0
scipy >= 1.9.0

Machine Learning:

transformers >= 4.20.0
diffusers >= 0.21.0
sentence-transformers >= 2.2.0
scikit-learn >= 1.1.0

Computer Vision:

opencv-python >= 4.5.0
Pillow >= 8.3.0

3D Processing:

mcubes >= 0.1.0
point-cloud-utils >= 0.29.0

Additional:

matplotlib >= 3.5.0
noise >= 1.2.2

Quick Start

Unified Multimodal Interface

from multimodal_3d_generator import MultiModal3DGenerator

# Create multimodal generator
generator = MultiModal3DGenerator()

# Generate from text
mesh, metadata = generator.generate_from_text(
    "a futuristic robot with metallic armor",
    output_path="robot.obj"
)

# Generate from image
mesh, metadata = generator.generate_from_image(
    "path/to/image.jpg",
    output_path="from_image.obj"
)

# Generate from both text and image
mesh, metadata = generator.generate_from_multimodal(
    "a blue metallic sculpture",
    "path/to/reference.jpg",
    output_path="multimodal.obj"
)

Text-to-3D Generation

from neural_text_to_3d import NeuralTextTo3D

# Neural generation
neural_gen = NeuralTextTo3D()
mesh = neural_gen.generate_from_text("a crystal castle with towers")

Image-to-3D Generation

from image_to_3d import ImageTo3DConverter

# Image-to-3D conversion
img_gen = ImageTo3DConverter()

# From single image
mesh = img_gen.single_image_to_3d("photo.jpg", method="depth_estimation")

# From stereo pair
mesh = img_gen.stereo_images_to_3d("left.jpg", "right.jpg")

Web Application

# Start web server
./start.sh dev

# Or manually
python app.py

# Visit http://localhost:5000 in your browser

Interactive Demo

# Command line interactive mode
python demo.py comprehensive

# Quick generation
python demo.py --prompt "a medieval sword"
python demo.py --image "path/to/image.jpg"

# Test web API
python run_demo.py

Supported Text Features

Shape Keywords

Basic shapes: cube, box, sphere, ball, cylinder, cone, pyramid, torus, ring, plane
Complex objects: building, house, tower, car, tree, mountain, rock, crystal, gem, bottle, vase

Size Modifiers

Scale: tiny, small, medium, large, huge, massive, enormous
Dimensions: tall, thin, thick, wide, narrow, long, flat

Materials

Metals: metal, metallic, steel, iron, gold, silver
Natural: wood, stone, rock, crystal, glass
Synthetic: plastic, ceramic, fabric, leather

Surface Properties

Texture: smooth, rough, bumpy, textured, polished, coarse
Finish: shiny, glossy, matte, dull, reflective
Transparency: transparent, clear, translucent

Style Descriptors

Modern: modern, contemporary, minimalist, clean, simple
Traditional: ancient, old, vintage, ornate, decorative, elaborate
Geometric: angular, sharp, pointed, faceted, curved, twisted

Architecture

Frontend Components

Web Interface: Modern responsive UI with Bootstrap and Three.js
Real-time Communication: Socket.IO for live progress updates
3D Visualization: Three.js-based model viewer with interactive controls
File Upload: Drag-and-drop image upload with preview

Backend API

Flask Server: RESTful API with multimodal endpoints
Background Processing: Async task handling with real-time updates
Model Storage: Automatic file management and serving
Error Handling: Graceful fallbacks and user feedback

Core 3D Generation

TextProcessor: Extracts 3D modeling parameters from natural language
GeometryGenerator: Creates and modifies 3D geometry based on extracted features
ModelExporter: Handles exporting to various 3D file formats
NeuralTextTo3D: Neural network-based text-to-3D generation
ImageTo3DConverter: Computer vision-based image-to-3D conversion
MultiModal3DGenerator: Unified multimodal generation system

Examples

Basic Examples

# Simple shapes
generator.generate("a blue cube")
generator.generate("a large yellow sphere")
generator.generate("a thin red cylinder")

# With surface properties
generator.generate("a rough metallic sphere")
generator.generate("a smooth glass cube")
generator.generate("a bumpy stone surface")

# Complex descriptions
generator.generate("a twisted green cone with textured surface")
generator.generate("a tall modern building with clean lines")

Advanced Examples

# Semantic objects
generator.generate_advanced("an ancient stone tower with weathered surfaces")
generator.generate_advanced("a modern glass skyscraper with reflective facade")
generator.generate_advanced("a wooden treasure chest with metal hinges")

# Material properties
generator.generate_advanced("a shiny metallic robot with angular design")
generator.generate_advanced("a translucent crystal formation")
generator.generate_advanced("a matte ceramic vase with smooth curves")

Batch Processing

# Generate multiple models
prompts = [
    "a red sports car",
    "a medieval castle",
    "a futuristic spaceship",
    "a natural rock formation"
]

meshes = generator.batch_generate(
    prompts, 
    output_dir="generated_models",
    format="obj"
)

API Reference

TextTo3DGenerator

Methods

generate(text_prompt, output_path=None, format='obj'): Generate 3D model from text
batch_generate(text_prompts, output_dir, format='obj'): Generate multiple models
get_model_info(mesh): Get technical information about generated model

AdvancedTextTo3D

Methods

generate_advanced(text_prompt, output_path=None): Generate with advanced features
analyze_prompt(text_prompt): Extract and return features from text

Output Formats

OBJ: Wavefront OBJ format with materials
STL: STereoLithography format for 3D printing
PLY: Polygon File Format with color support
GLB: Binary glTF format for web applications

Performance Notes

Complexity: Higher complexity values increase generation time but add detail
Resolution: Default resolution balances quality and performance
Batch Processing: More efficient for generating multiple models

Limitations

Generated models are based on geometric primitives and procedural techniques
Complex organic shapes may require additional post-processing
Material properties are approximated and may need refinement for specific applications

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

License

MIT License - see LICENSE file for details# fast3d

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
static		static
templates		templates
.env.example		.env.example
Dockerfile		Dockerfile
GETTING_STARTED.md		GETTING_STARTED.md
README.md		README.md
advanced_generator.py		advanced_generator.py
app.py		app.py
demo.py		demo.py
docker-compose.yml		docker-compose.yml
image_to_3d.py		image_to_3d.py
multimodal_3d_generator.py		multimodal_3d_generator.py
neural_text_to_3d.py		neural_text_to_3d.py
nginx.conf		nginx.conf
requirements.txt		requirements.txt
run_demo.py		run_demo.py
start.sh		start.sh
text_to_3d.py		text_to_3d.py

j0rGeT/fast3d

Folders and files

Latest commit

History

Repository files navigation

Fast3D - Multimodal Text/Image-to-3D Generation System

Features

Core Capabilities

Technical Features

Installation

Dependencies

Quick Start

Unified Multimodal Interface

Text-to-3D Generation

Image-to-3D Generation

Web Application

Interactive Demo

Supported Text Features

Shape Keywords

Size Modifiers

Materials

Surface Properties

Style Descriptors

Architecture

Frontend Components

Backend API

Core 3D Generation

Examples

Basic Examples

Advanced Examples

Batch Processing

API Reference

TextTo3DGenerator

Methods

AdvancedTextTo3D

Methods

Output Formats

Performance Notes

Limitations

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages