This sample demonstrates how to use Microsoft Foundry Local as a REST API service without relying on the OpenAI SDK. It shows direct HTTP integration patterns for maximum control and customization.
Based on Microsoft's official Foundry Local patterns, this sample provides:
- Direct REST API integration with FoundryLocalManager
- Custom HTTP client implementation
- Model management and health monitoring
- Streaming and non-streaming response handling
- Production-ready error handling and retry logic
-
Foundry Local Installation
# Install from GitHub releases winget install Microsoft.FoundryLocal -
Python Dependencies
pip install foundry-local-sdk requests asyncio aiohttp
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Your App │───▶│ REST API Client │───▶│ Foundry Local │
│ │ │ │ │ Service │
│ - Custom Logic │ │ - HTTP Requests │ │ - Model Loading │
│ - Business Rules│ │ - Authentication │ │ - Inference │
│ - Data Pipeline │ │ - Error Handling │ │ - Health Check │
└─────────────────┘ └──────────────────┘ └─────────────────┘
- Pure REST API calls without SDK dependencies
- Custom authentication and headers
- Full control over request/response handling
- Dynamic model loading and unloading
- Health monitoring and status checks
- Performance metrics collection
- Retry mechanisms with exponential backoff
- Circuit breaker for fault tolerance
- Comprehensive logging and monitoring
- Streaming responses for real-time applications
- Batch processing for high-throughput scenarios
- Custom response parsing and validation
from api_client import FoundryAPIClient
# Initialize the API client
client = FoundryAPIClient()
# Simple completion
response = await client.complete(
prompt="Explain quantum computing",
model="phi-4-mini",
max_tokens=500
)
print(response.content)# Stream responses for real-time applications
async for chunk in client.stream_complete(
prompt="Write a story about AI",
model="phi-4-mini"
):
print(chunk.content, end="", flush=True)# Check service health
health = await client.health_check()
print(f"Service Status: {health.status}")
print(f"Active Models: {health.loaded_models}")
print(f"Memory Usage: {health.memory_usage}")07/
├── README.md # This documentation
├── requirements.txt # Python dependencies
├── api_client.py # Core API client implementation
├── health_monitor.py # Health checking and monitoring
├── examples/
│ ├── basic_usage.py # Simple API integration example
│ ├── streaming.py # Streaming response example
│ ├── batch_processing.py # Batch processing example
│ └── production.py # Production-ready implementation
└── tests/
├── test_api_client.py # Unit tests for API client
└── test_integration.py # Integration tests
This sample follows Microsoft's official patterns:
- SDK Integration: Uses
FoundryLocalManagerfor service management - REST Endpoints: Direct calls to
/v1/chat/completionsand other endpoints - Authentication: Proper API key handling for local services
- Model Management: Catalog listing, downloading, and loading patterns
- Error Handling: Microsoft-recommended error codes and responses
-
Install Dependencies
pip install -r requirements.txt
-
Run Basic Example
python examples/basic_usage.py
-
Try Streaming
python examples/streaming.py
-
Production Setup
python examples/production.py
Environment variables for customization:
FOUNDRY_MODEL: Default model to use (default: "phi-4-mini")FOUNDRY_TIMEOUT: Request timeout in seconds (default: 30)FOUNDRY_RETRIES: Number of retry attempts (default: 3)FOUNDRY_LOG_LEVEL: Logging level (default: "INFO")
- Connection Management: Reuse HTTP connections for better performance
- Error Handling: Implement proper retry logic with exponential backoff
- Resource Monitoring: Track model memory usage and performance
- Security: Use proper authentication even for local services
- Testing: Include both unit and integration tests
Service Not Running
# Check Foundry Local status
foundry status
# Start if needed
foundry startModel Loading Issues
# List available models
foundry model list
# Download specific model
foundry model download phi-4-miniConnection Errors
- Verify Foundry Local is running on correct port
- Check firewall settings
- Ensure proper authentication headers
- Connection Pooling: Use session objects for multiple requests
- Async Operations: Leverage asyncio for concurrent requests
- Caching: Cache model responses where appropriate
- Monitoring: Track response times and adjust timeouts
After completing this sample, you will understand:
- Direct REST API integration with Foundry Local
- Custom HTTP client implementation patterns
- Production-ready error handling and monitoring
- Microsoft Foundry Local service architecture
- Performance optimization techniques for local AI services
- Explore Sample 08: Windows 11 Chat Application
- Try Sample 09: Multi-Agent Orchestration
- Learn Sample 10: Foundry Local as Tools Integration