A comprehensive sample demonstrating multiple approaches to building production-ready chat applications using Microsoft Foundry Local, featuring modern web interfaces, streaming responses, and cutting-edge browser technologies.
- 🚀 Chainlit Chat App (
app.py): Production-ready chat application with streaming - 🌐 WebGPU Demo (
webgpu-demo/): Browser-based AI inference with hardware acceleration - 🎨 Open WebUI Integration (
open-webui-guide.md): Professional ChatGPT-like interface - 📚 Educational Notebook (
chainlit_app.ipynb): Interactive learning materials
# Navigate to Module08 directory
cd Module08
# Start your model
foundry model run phi-4-mini
# Run Chainlit app (using port 8080 to avoid conflicts)
chainlit run samples\04\app.py -w --port 8080Opens at: http://localhost:8080
# Navigate to WebGPU demo
cd Module08\samples\04\webgpu-demo
# Serve the demo
python -m http.server 5173Opens at: http://localhost:5173
# Run Open WebUI with Docker
docker run -d --name open-webui -p 3000:8080 \
-e OPENAI_API_BASE_URL=http://host.docker.internal:51211/v1 \
-e OPENAI_API_KEY=foundry-local-key \
ghcr.io/open-webui/open-webui:mainOpens at: http://localhost:3000
| Scenario | Recommendation | Reason |
|---|---|---|
| Privacy-Sensitive Data | 🏠 Local (Foundry) | Data never leaves device |
| Complex Reasoning | ☁️ Cloud (Azure OpenAI) | Access to larger models |
| Real-time Chat | 🏠 Local (Foundry) | Lower latency, faster responses |
| Document Analysis | 🔄 Hybrid | Local for extraction, cloud for analysis |
| Code Generation | 🏠 Local (Foundry) | Privacy + specialized models |
| Research Tasks | ☁️ Cloud (Azure OpenAI) | Broad knowledge base needed |
| Technology | Use Case | Pros | Cons |
|---|---|---|---|
| Chainlit | Python developers, rapid prototyping | Easy setup, streaming support | Python-only |
| WebGPU | Maximum privacy, offline scenarios | Browser-native, no server needed | Limited model size |
| Open WebUI | Production deployment, teams | Professional UI, user management | Requires Docker |
- Foundry Local: Installed and running (Download)
- Python: 3.10+ with virtual environment
- Model: At least one loaded (
foundry model run phi-4-mini) - Browser: Chrome/Edge with WebGPU support for demos
- Docker: For Open WebUI (optional)
# Navigate to Module08 directory
cd Module08
# Create and activate virtual environment
py -m venv .venv
.venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Verify Foundry Local installation
foundry --version
# Start the service
foundry service start
# Load a model
foundry model run phi-4-mini
# Verify model is running
foundry service psFeatures:
- 🚀 Real-time Streaming: Tokens appear as they're generated
- 🛡️ Robust Error Handling: Graceful degradation and recovery
- 🎨 Modern UI: Professional chat interface out of the box
- 🔧 Flexible Configuration: Environment variables and auto-detection
- 📱 Responsive Design: Works on desktop and mobile devices
Quick Start:
# Run with default settings (recommended)
chainlit run samples\04\app.py -w --port 8080
# Use specific model
set MODEL=qwen2.5-7b
chainlit run samples\04\app.py -w --port 8080
# Manual endpoint configuration
set BASE_URL=http://localhost:51211
set API_KEY=your-api-key
chainlit run samples\04\app.py -w --port 8080Features:
- 🌐 Browser-native AI: No server required, runs entirely in browser
- ⚡ WebGPU Acceleration: Hardware acceleration when available
- 🔒 Maximum Privacy: No data ever leaves your device
- 🎯 Zero Install: Works in any compatible browser
- 🔄 Graceful Fallback: Falls back to CPU if WebGPU unavailable
Running:
cd samples\04\webgpu-demo
python -m http.server 5173
# Open http://localhost:5173Features:
- 🎨 ChatGPT-like Interface: Professional, familiar UI
- 👥 Multi-user Support: User accounts and conversation history
- 📁 File Processing: Upload and analyze documents
- 🔄 Model Switching: Easy switching between different models
- 🐳 Docker Deployment: Production-ready containerized setup
Quick Setup:
docker run -d --name open-webui -p 3000:8080 \
-e OPENAI_API_BASE_URL=http://host.docker.internal:51211/v1 \
-e OPENAI_API_KEY=foundry-local-key \
ghcr.io/open-webui/open-webui:main| Variable | Description | Default | Example |
|---|---|---|---|
MODEL |
Model alias to use | phi-4-mini |
qwen2.5-7b |
BASE_URL |
Foundry Local endpoint | Auto-detected | http://localhost:51211 |
API_KEY |
API key (optional for local) | "" |
your-api-key |
Chainlit Application:
-
Service not available:
# Check Foundry Local status foundry service status foundry service ps # Validate API endpoint (note: port 51211) curl http://localhost:51211/v1/models
-
Port conflicts:
# Check what's using port 8080 netstat -ano | findstr :8080 # Use different port if needed chainlit run samples\04\app.py -w --port 3000
-
Python environment issues:
# Verify correct interpreter in VS Code # Ctrl+Shift+P → Python: Select Interpreter # Choose: Module08/.venv/Scripts/python.exe # Reinstall dependencies pip install -r requirements.txt
WebGPU Demo:
-
WebGPU not supported:
- Update to Chrome/Edge 113+
- Enable WebGPU:
chrome://flags/#enable-unsafe-webgpu - Check GPU status:
chrome://gpu - Demo will fallback to CPU automatically
-
Model loading errors:
- Ensure internet connection for model download
- Check browser console for CORS errors
- Verify you're serving via HTTP (not file://)
Open WebUI:
-
Connection refused:
# Check Docker is running docker --version # Check container status docker ps | findstr open-webui # View container logs docker logs open-webui
-
Models not appearing:
# Verify Foundry Local endpoint curl http://localhost:51211/v1/models # Restart Open WebUI docker restart open-webui
# ✅ 1. Foundry Local Setup
foundry --version # Should show version
foundry service status # Should show "running"
foundry model list # Should show loaded models
curl http://localhost:51211/v1/models # Should return JSON
# ✅ 2. Python Environment
python --version # Should be 3.10+
pip list | findstr chainlit # Should show chainlit package
pip list | findstr openai # Should show openai package
# ✅ 3. Application Testing
chainlit run samples\04\app.py -w --port 8080 # Should open browser
# Test WebGPU demo at localhost:5173
# Test Open WebUI at localhost:3000Chainlit:
- Use streaming for better perceived performance
- Implement connection pooling for high concurrency
- Cache model responses for repeated queries
- Monitor memory usage with large conversation histories
WebGPU:
- Use WebGPU for maximum privacy and speed
- Implement model quantization for smaller models
- Use Web Workers for background processing
- Cache compiled models in browser storage
Open WebUI:
- Use persistent volumes for conversation history
- Configure resource limits for Docker container
- Implement backup strategies for user data
- Set up reverse proxy for SSL termination
Hybrid Local/Cloud:
# Route based on complexity and privacy requirements
async def intelligent_routing(prompt: str, metadata: dict):
if metadata.get("contains_pii"):
return await foundry_local_completion(prompt) # Privacy-sensitive
elif len(prompt.split()) > 200:
return await azure_openai_completion(prompt) # Complex reasoning
else:
return await foundry_local_completion(prompt) # Default localMulti-Modal Pipeline:
# Combine different AI capabilities
async def analyze_document(file_path: str):
# 1. OCR with WebGPU (browser-based)
text = await webgpu_ocr(file_path)
# 2. Analysis with Foundry Local (private)
summary = await foundry_local_analyze(text)
# 3. Enhancement with cloud (if needed)
if summary.confidence < 0.8:
summary = await azure_openai_enhance(summary)
return summary- API Keys: Use environment variables, never hardcode
- Network: Use HTTPS in production, consider VPN for team access
- Access Control: Implement authentication for Open WebUI
- Data Privacy: Audit what data stays local vs. goes to cloud
- Updates: Keep Foundry Local and containers updated
- Health Checks: Implement endpoint monitoring
- Logging: Centralize logs from all components
- Metrics: Track response times, error rates, resource usage
- Backup: Regular backup of conversation data and configurations
- Chainlit Documentation - Complete framework guide
- Foundry Local Documentation - Official Microsoft docs
- ONNX Runtime Web - WebGPU integration
- Open WebUI Documentation - Advanced configuration
app.py- Production Chainlit applicationchainlit_app.ipynb- Educational notebookwebgpu-demo/- Browser-based AI inferenceopen-webui-guide.md- Complete Open WebUI setup
- Session 4 Documentation - Complete session guide
- Foundry Local Samples - Official samples