An intelligent web application that generates descriptive alt text for images using LangChain and Ollama with vision-capable LLMs.
- 🎨 Modern UI - Clean, responsive design
- 🖼️ Drag & Drop - Easy image upload with preview functionality
- 🤖 AI-Powered - Uses Ollama with LLaVA vision model via LangChain
- 🔐 Secure - File validation and type checking
- ⚡ Fast Processing - Optimized image handling and batch processing
- ♿ Accessibility First - Generates W3C compliant alt text
Before running the application, ensure you have:
- Node.js (v16 or higher)
- Ollama installed and running
- LLaVA vision model downloaded
git clone <repository-url>
cd alt-text# Install Node.js dependencies
npm install
# Verify installation
npm list --depth=0# Install Ollama (if not already installed)
# macOS:
brew install ollama
# Or download from https://ollama.ai
# Download the LLaVA vision model
ollama pull llava:latest
# Start Ollama server (keep this running)
ollama serve# Production mode
npm start
# Development mode (with auto-reload)
npm run devNavigate to: http://localhost:3000
- Health Check: Visit http://localhost:3000/api/health
- Ollama Status: Visit http://localhost:3000/api/ollama/status
- Upload Test: Try uploading an image through the web interface
# Test health endpoint
curl http://localhost:3000/api/health
# Test Ollama connection
curl http://localhost:3000/api/ollama/status
# Test image upload (replace with actual image file)
curl -X POST -F "images=@test-image.jpg" http://localhost:3000/api/generate-alt-textIf tests fail, check:
- Ollama is running:
ollama serve - LLaVA model is installed:
ollama list - Node.js server is running on port 3000
- No firewall blocking localhost:3000 or localhost:11434
- Upload Images: Drag and drop images or click to browse
- Generate Alt Text: Click the "Generate Alt Text" button
- AI Processing: Images are sent to Ollama's LLaVA model via LangChain
- Review Results: View AI-generated alt text (≤150 chars)
The LLaVA vision model examines:
- Main subjects and objects in the image
- Environmental context and setting details
- Visual mood and atmosphere
- Spatial relationships between elements
- Colors, lighting, and composition
GET /api/health- Health checkGET /api/ollama/status- Check Ollama connectionPOST /api/generate-alt-text- Generate alt text for uploaded images
- HTML5 with semantic structure
- CSS3 with responsive design and animations
- Vanilla JavaScript with modern ES6+ features
- File API for drag-and-drop functionality
- Express.js server with CORS support
- Multer for file upload handling
- LangChain for LLM integration
- Ollama for local AI inference
- Sharp for image optimization
- LLaVA Model - Vision-language model for image analysis
- LangChain - Framework for LLM application development
- Ollama - Local LLM server for privacy and performance
alt-text/
├── index.html # Main HTML file
├── styles.css # CSS styling
├── script.js # Frontend JavaScript
├── server.js # Express backend server
├── package.json # Node.js dependencies
├── .gitignore # Git ignore file
└── README.md # This file
PORT- Server port (default: 3000)OLLAMA_BASE_URL- Ollama server URL (default: http://localhost:11434)
The server uses these default settings for the LLaVA model:
- Temperature: 0.3 (for consistent results)
- Model: llava:latest
- Max Image Size: 512x512 (optimized for performance)
The LLM uses this prompt for generating alt text (located in server.js:77):
Describe what's in this picture and then reduce the description to the W3C specification text string length for an HTML image alt tags attribute. Description should include the subject, environment, settings, and the overall mood of the image. Respond only with the HTML image alt tag text. Length of text should be 150 characters or less
This prompt ensures:
- W3C Compliance - Follows HTML alt attribute standards
- Concise Output - Limited to 150 characters as per best practices
- Comprehensive Analysis - Includes subject, environment, setting, and mood
- Clean Response - Returns only the alt text without extra formatting
-
"Cannot connect to Ollama"
- Ensure Ollama is running:
ollama serve - Check if LLaVA model is installed:
ollama list
- Ensure Ollama is running:
-
"File too large" errors
- Images are limited to 10MB
- Images are automatically resized to 512x512 for optimal processing
-
Slow processing
- First-time model loading can be slow
- Consider using a smaller model like
llava:7bfor faster inference
Visit these endpoints to verify everything is working:
- http://localhost:3000/api/health - Server status
- http://localhost:3000/api/ollama/status - Ollama connection
- Images are automatically optimized to 512x512 resolution
- JPEG compression at 80% quality for LLM processing
- Batch processing for multiple images
- Efficient memory management with stream processing
- File type validation (images only)
- File size limits (10MB max)
- No persistent file storage on server
- Input sanitization and error handling
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
MIT License - feel free to use this project for personal or commercial purposes.
- Ollama Team - For the excellent local LLM server
- LangChain - For the powerful LLM framework
- LLaVA - For the vision-language model