Sample 04: Production Chat Applications with Chainlit

A comprehensive sample demonstrating multiple approaches to building production-ready chat applications using Microsoft Foundry Local, featuring modern web interfaces, streaming responses, and cutting-edge browser technologies.

What's Included

🚀 Chainlit Chat App (app.py): Production-ready chat application with streaming
🌐 WebGPU Demo (webgpu-demo/): Browser-based AI inference with hardware acceleration
🎨 Open WebUI Integration (open-webui-guide.md): Professional ChatGPT-like interface
📚 Educational Notebook (chainlit_app.ipynb): Interactive learning materials

Quick Start

1. Chainlit Chat Application

# Navigate to Module08 directory
cd Module08

# Start your model
foundry model run phi-4-mini

# Run Chainlit app (using port 8080 to avoid conflicts)
chainlit run samples\04\app.py -w --port 8080

Opens at: http://localhost:8080

2. WebGPU Browser Demo

# Navigate to WebGPU demo
cd Module08\samples\04\webgpu-demo

# Serve the demo
python -m http.server 5173

Opens at: http://localhost:5173

3. Open WebUI Setup

# Run Open WebUI with Docker
docker run -d --name open-webui -p 3000:8080 \
  -e OPENAI_API_BASE_URL=http://host.docker.internal:51211/v1 \
  -e OPENAI_API_KEY=foundry-local-key \
  ghcr.io/open-webui/open-webui:main

Opens at: http://localhost:3000

Architecture Patterns

Local vs Cloud Decision Matrix

Scenario	Recommendation	Reason
Privacy-Sensitive Data	🏠 Local (Foundry)	Data never leaves device
Complex Reasoning	☁️ Cloud (Azure OpenAI)	Access to larger models
Real-time Chat	🏠 Local (Foundry)	Lower latency, faster responses
Document Analysis	🔄 Hybrid	Local for extraction, cloud for analysis
Code Generation	🏠 Local (Foundry)	Privacy + specialized models
Research Tasks	☁️ Cloud (Azure OpenAI)	Broad knowledge base needed

Technology Comparison

Technology	Use Case	Pros	Cons
Chainlit	Python developers, rapid prototyping	Easy setup, streaming support	Python-only
WebGPU	Maximum privacy, offline scenarios	Browser-native, no server needed	Limited model size
Open WebUI	Production deployment, teams	Professional UI, user management	Requires Docker

Prerequisites

Foundry Local: Installed and running (Download)
Python: 3.10+ with virtual environment
Model: At least one loaded (foundry model run phi-4-mini)
Browser: Chrome/Edge with WebGPU support for demos
Docker: For Open WebUI (optional)

Installation & Setup

1. Python Environment Setup

# Navigate to Module08 directory
cd Module08

# Create and activate virtual environment
py -m venv .venv
.venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Foundry Local Setup

# Verify Foundry Local installation
foundry --version

# Start the service
foundry service start

# Load a model
foundry model run phi-4-mini

# Verify model is running
foundry service ps

Sample Applications

Chainlit Chat Application

Features:

🚀 Real-time Streaming: Tokens appear as they're generated
🛡️ Robust Error Handling: Graceful degradation and recovery
🎨 Modern UI: Professional chat interface out of the box
🔧 Flexible Configuration: Environment variables and auto-detection
📱 Responsive Design: Works on desktop and mobile devices

Quick Start:

# Run with default settings (recommended)
chainlit run samples\04\app.py -w --port 8080

# Use specific model
set MODEL=qwen2.5-7b
chainlit run samples\04\app.py -w --port 8080

# Manual endpoint configuration
set BASE_URL=http://localhost:51211
set API_KEY=your-api-key
chainlit run samples\04\app.py -w --port 8080

WebGPU Browser Demo

Features:

🌐 Browser-native AI: No server required, runs entirely in browser
⚡ WebGPU Acceleration: Hardware acceleration when available
🔒 Maximum Privacy: No data ever leaves your device
🎯 Zero Install: Works in any compatible browser
🔄 Graceful Fallback: Falls back to CPU if WebGPU unavailable

Running:

cd samples\04\webgpu-demo
python -m http.server 5173
# Open http://localhost:5173

Open WebUI Integration

Features:

🎨 ChatGPT-like Interface: Professional, familiar UI
👥 Multi-user Support: User accounts and conversation history
📁 File Processing: Upload and analyze documents
🔄 Model Switching: Easy switching between different models
🐳 Docker Deployment: Production-ready containerized setup

Quick Setup:

docker run -d --name open-webui -p 3000:8080 \
  -e OPENAI_API_BASE_URL=http://host.docker.internal:51211/v1 \
  -e OPENAI_API_KEY=foundry-local-key \
  ghcr.io/open-webui/open-webui:main

Configuration Reference

Environment Variables

Variable	Description	Default	Example
`MODEL`	Model alias to use	`phi-4-mini`	`qwen2.5-7b`
`BASE_URL`	Foundry Local endpoint	Auto-detected	`http://localhost:51211`
`API_KEY`	API key (optional for local)	`""`	`your-api-key`

Troubleshooting

Common Issues

Chainlit Application:

Service not available:

# Check Foundry Local status
foundry service status
foundry service ps

# Validate API endpoint (note: port 51211)
curl http://localhost:51211/v1/models

Port conflicts:

# Check what's using port 8080
netstat -ano | findstr :8080

# Use different port if needed
chainlit run samples\04\app.py -w --port 3000

Python environment issues:

# Verify correct interpreter in VS Code
# Ctrl+Shift+P → Python: Select Interpreter
# Choose: Module08/.venv/Scripts/python.exe

# Reinstall dependencies
pip install -r requirements.txt

WebGPU Demo:

WebGPU not supported:
- Update to Chrome/Edge 113+
- Enable WebGPU: chrome://flags/#enable-unsafe-webgpu
- Check GPU status: chrome://gpu
- Demo will fallback to CPU automatically
Model loading errors:
- Ensure internet connection for model download
- Check browser console for CORS errors
- Verify you're serving via HTTP (not file://)

Open WebUI:

Connection refused:

# Check Docker is running
docker --version

# Check container status
docker ps | findstr open-webui

# View container logs
docker logs open-webui

Models not appearing:

# Verify Foundry Local endpoint
curl http://localhost:51211/v1/models

# Restart Open WebUI
docker restart open-webui

Validation Checklist

# ✅ 1. Foundry Local Setup
foundry --version                    # Should show version
foundry service status               # Should show "running"
foundry model list                   # Should show loaded models
curl http://localhost:51211/v1/models  # Should return JSON

# ✅ 2. Python Environment  
python --version                     # Should be 3.10+
pip list | findstr chainlit         # Should show chainlit package
pip list | findstr openai           # Should show openai package

# ✅ 3. Application Testing
chainlit run samples\04\app.py -w --port 8080  # Should open browser
# Test WebGPU demo at localhost:5173
# Test Open WebUI at localhost:3000

Advanced Usage

Performance Optimization

Chainlit:

Use streaming for better perceived performance
Implement connection pooling for high concurrency
Cache model responses for repeated queries
Monitor memory usage with large conversation histories

WebGPU:

Use WebGPU for maximum privacy and speed
Implement model quantization for smaller models
Use Web Workers for background processing
Cache compiled models in browser storage

Open WebUI:

Use persistent volumes for conversation history
Configure resource limits for Docker container
Implement backup strategies for user data
Set up reverse proxy for SSL termination

Integration Patterns

Hybrid Local/Cloud:

# Route based on complexity and privacy requirements
async def intelligent_routing(prompt: str, metadata: dict):
    if metadata.get("contains_pii"):
        return await foundry_local_completion(prompt)  # Privacy-sensitive
    elif len(prompt.split()) > 200:
        return await azure_openai_completion(prompt)   # Complex reasoning
    else:
        return await foundry_local_completion(prompt)  # Default local

Multi-Modal Pipeline:

# Combine different AI capabilities
async def analyze_document(file_path: str):
    # 1. OCR with WebGPU (browser-based)
    text = await webgpu_ocr(file_path)
    
    # 2. Analysis with Foundry Local (private)
    summary = await foundry_local_analyze(text)
    
    # 3. Enhancement with cloud (if needed)
    if summary.confidence < 0.8:
        summary = await azure_openai_enhance(summary)
    
    return summary

Production Deployment

Security Considerations

API Keys: Use environment variables, never hardcode
Network: Use HTTPS in production, consider VPN for team access
Access Control: Implement authentication for Open WebUI
Data Privacy: Audit what data stays local vs. goes to cloud
Updates: Keep Foundry Local and containers updated

Monitoring and Maintenance

Health Checks: Implement endpoint monitoring
Logging: Centralize logs from all components
Metrics: Track response times, error rates, resource usage
Backup: Regular backup of conversation data and configurations

References and Resources

Documentation

Chainlit Documentation - Complete framework guide
Foundry Local Documentation - Official Microsoft docs
ONNX Runtime Web - WebGPU integration
Open WebUI Documentation - Advanced configuration

Sample Files

app.py - Production Chainlit application
chainlit_app.ipynb - Educational notebook
webgpu-demo/ - Browser-based AI inference
open-webui-guide.md - Complete Open WebUI setup

Related Samples

Session 4 Documentation - Complete session guide
Foundry Local Samples - Official samples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample 04: Production Chat Applications with Chainlit

What's Included

Quick Start

1. Chainlit Chat Application

2. WebGPU Browser Demo

3. Open WebUI Setup

Architecture Patterns

Local vs Cloud Decision Matrix

Technology Comparison

Prerequisites

Installation & Setup

1. Python Environment Setup

2. Foundry Local Setup

Sample Applications

Chainlit Chat Application

WebGPU Browser Demo

Open WebUI Integration

Configuration Reference

Environment Variables

Troubleshooting

Common Issues

Validation Checklist

Advanced Usage

Performance Optimization

Integration Patterns

Production Deployment

Security Considerations

Monitoring and Maintenance

References and Resources

Documentation

Sample Files

Related Samples

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Sample 04: Production Chat Applications with Chainlit

What's Included

Quick Start

1. Chainlit Chat Application

2. WebGPU Browser Demo

3. Open WebUI Setup

Architecture Patterns

Local vs Cloud Decision Matrix

Technology Comparison

Prerequisites

Installation & Setup

1. Python Environment Setup

2. Foundry Local Setup

Sample Applications

Chainlit Chat Application

WebGPU Browser Demo

Open WebUI Integration

Configuration Reference

Environment Variables

Troubleshooting

Common Issues

Validation Checklist

Advanced Usage

Performance Optimization

Integration Patterns

Production Deployment

Security Considerations

Monitoring and Maintenance

References and Resources

Documentation

Sample Files

Related Samples