This comprehensive guide will help you prepare for the EdgeAI course, which focuses on building practical AI solutions that run efficiently on edge devices. The course emphasizes hands-on development using modern frameworks and state-of-the-art models optimized for edge deployment.
Python Environment
- Version: Python 3.10 or higher (recommended: Python 3.11)
- Package Manager: pip or conda
- Virtual Environment: Use venv or conda environments for isolation
- Key Libraries: We'll install specific EdgeAI libraries during the course
Microsoft .NET Environment
- Version: .NET 8 or higher
- IDE: Visual Studio 2022, Visual Studio Code, or JetBrains Rider
- SDK: Ensure .NET SDK is installed for cross-platform development
Code Editors & IDEs
- Visual Studio Code (recommended for cross-platform development)
- PyCharm or Visual Studio (for language-specific development)
- Jupyter Notebooks for interactive development and prototyping
Version Control
- Git (latest version)
- GitHub account for accessing repositories and collaboration
- CPU: Multi-core processor (Intel i5/AMD Ryzen 5 or equivalent)
- RAM: 8GB minimum, 16GB recommended
- Storage: 50GB available space for models and development tools
- OS: Windows 10/11, macOS 10.15+, or Linux (Ubuntu 20.04+)
The course is designed to be accessible across different hardware configurations:
Local Development (CPU/NPU Focus)
- Primary development will utilize CPU and NPU acceleration
- Suitable for most modern laptops and desktops
- Focus on efficiency and practical deployment scenarios
Cloud GPU Resources (Optional)
- Azure Machine Learning: For intensive training and experimentation
- Google Colab: Free tier available for educational purposes
- Kaggle Notebooks: Alternative cloud computing platform
- Understanding of ARM-based processors
- Knowledge of mobile and IoT hardware constraints
- Familiarity with power consumption optimization
Microsoft Phi-4 Family
- Description: Compact, efficient models designed for edge deployment
- Strengths: Excellent performance-to-size ratio, optimized for reasoning tasks
- Resource: Phi-4 Collection on Hugging Face
- Use Cases: Code generation, mathematical reasoning, general conversation
Qwen-3 Family
- Description: Alibaba's latest generation of multilingual models
- Strengths: Strong multilingual capabilities, efficient architecture
- Resource: Qwen-3 Collection on Hugging Face
- Use Cases: Multilingual applications, cross-cultural AI solutions
Google Gemma-3n Family
- Description: Google's lightweight models optimized for edge deployment
- Strengths: Fast inference, mobile-friendly architecture
- Resource: Gemma-3n Collection on Hugging Face
- Use Cases: Mobile applications, real-time processing
- Performance vs. Size Trade-offs: Understanding when to choose smaller vs. larger models
- Task-Specific Optimization: Matching models to specific use cases
- Deployment Constraints: Memory, latency, and power consumption considerations
- Repository: Llama.cpp on GitHub
- Purpose: High-performance inference engine for LLMs
- Key Features:
- CPU-optimized inference
- Multiple quantization formats (Q4, Q5, Q8)
- Cross-platform compatibility
- Memory-efficient execution
- Installation and Basic Usage:
# Clone the repository git clone https://github.com/ggml-org/llama.cpp.git cd llama.cpp # Build the project with optimizations mkdir build && cd build cmake .. -DCMAKE_BUILD_TYPE=Release cmake --build . --config Release # Quantize a model (from GGUF format to 4-bit quantization) ./quantize ../models/original-model.gguf ../models/quantized-model-q4_0.gguf q4_0 # Run inference with the quantized model ./main -m ../models/quantized-model-q4_0.gguf -n 512 -p "Write a function to calculate fibonacci numbers in Python:"
-
Repository: Microsoft Olive on GitHub
-
Purpose: Model optimization toolkit for edge deployment
-
Key Features:
- Automated model optimization workflows
- Hardware-aware optimization
- Integration with ONNX Runtime
- Performance benchmarking tools
-
Installation and Basic Usage:
# Install Olive pip install olive-aifrom olive.model import ONNXModel from olive.workflows import run_workflow # Define model and optimization config model = ONNXModel("original_model.onnx") config = { "input_model": model, "systems": { "local_system": { "type": "LocalSystem" } }, "engine": { "log_severity_level": 0, "cache_dir": "cache" }, "passes": { "quantization": { "type": "OrtQuantization", "config": { "quant_mode": "static", "activation_type": "int8", "weight_type": "int8" } } } } # Run optimization workflow result = run_workflow(config) optimized_model = result.optimized_model # Save optimized model optimized_model.save("optimized_model.onnx")
-
Repository: Apple MLX on GitHub
-
Purpose: Machine learning framework for Apple Silicon
-
Key Features:
- Native Apple Silicon optimization
- Memory-efficient operations
- PyTorch-like API
- Unified memory architecture support
-
Installation and Basic Usage:
# Install MLX pip install mlx# Example Python script for loading and optimizing a model import mlx.core as mx import mlx.nn as nn from mlx.utils import tree_flatten # Load pre-trained weights (example with a simple MLP) class MLP(nn.Module): def __init__(self, dim=768, hidden_dim=3072): super().__init__() self.fc1 = nn.Linear(dim, hidden_dim) self.fc2 = nn.Linear(hidden_dim, dim) def __call__(self, x): return self.fc2(mx.maximum(0, self.fc1(x))) # Create model and load weights model = MLP() weights = mx.load("original_weights.npz") model.update(weights) # Quantize the model weights to FP16 def quantize_weights(model): params = {} for k, v in tree_flatten(model.parameters()): params[k] = v.astype(mx.float16) model.update(params) return model quantized_model = quantize_weights(model) # Save quantized model mx.save("quantized_model.npz", quantized_model.parameters()) # Run inference input_data = mx.random.normal((1, 768)) output = quantized_model(input_data)
-
Repository: ONNX Runtime on GitHub
-
Purpose: Cross-platform inference acceleration for ONNX models
-
Key Features:
- Hardware-specific optimizations (CPU, GPU, NPU)
- Graph optimizations for inference
- Quantization support
- Cross-language support (Python, C++, C#, JavaScript)
-
Installation and Basic Usage:
# Install ONNX Runtime pip install onnxruntime # For GPU support pip install onnxruntime-gpu
import onnxruntime as ort import numpy as np # Create inference session with optimizations sess_options = ort.SessionOptions() sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL sess_options.enable_profiling = True # Enable performance profiling # Create session with provider selection for hardware acceleration providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] # Use GPU if available session = ort.InferenceSession("model.onnx", sess_options, providers=providers) # Prepare input data input_name = session.get_inputs()[0].name input_shape = session.get_inputs()[0].shape input_data = np.random.rand(*input_shape).astype(np.float32) # Run inference outputs = session.run(None, {input_name: input_data}) # Get profiling data prof_file = session.end_profiling() print(f"Profiling data saved to: {prof_file}")
- ONNX Runtime Documentation: Understanding cross-platform inference
- Hugging Face Transformers Guide: Model loading and inference
- Edge AI Design Patterns: Best practices for edge deployment
- "Efficient Edge AI: A Survey of Quantization Techniques"
- "Model Compression for Mobile and Edge Devices"
- "Optimizing Transformer Models for Edge Computing"
- EdgeAI Slack/Discord Communities: Peer support and discussion
- GitHub Repositories: Example implementations and tutorials
- YouTube Channels: Technical deep-dives and tutorials
- Python 3.10+ installed and verified
- .NET 8+ installed and verified
- Development environment configured
- Hugging Face account created
- Basic familiarity with target model families
- Quantization tools installed and tested
- Hardware requirements met
- Cloud computing accounts set up (if needed)
By the end of this guide, you will be able to:
- Set up a complete development environment for EdgeAI application development
- Install and configure the necessary tools and frameworks for model optimization
- Select appropriate hardware and software configurations for your EdgeAI projects
- Understand the key considerations for deploying AI models on edge devices
- Prepare your system for the hands-on exercises in the course
- Python Documentation: Official Python language documentation
- Microsoft .NET Documentation: Official .NET development resources
- ONNX Runtime Documentation: Comprehensive guide to ONNX Runtime
- TensorFlow Lite Documentation: Official TensorFlow Lite documentation
- Visual Studio Code: Lightweight code editor with AI development extensions
- Jupyter Notebooks: Interactive computing environment for ML experimentation
- Docker: Containerization platform for consistent development environments
- Git: Version control system for code management
- EdgeAI Research Papers: Latest academic research on efficient models
- Online Courses: Supplementary learning materials on AI optimization
- Community Forums: Q&A platforms for EdgeAI development challenges
- Benchmark Datasets: Standard datasets for evaluating model performance
After completing this preparation guide, you will:
- Have a fully configured development environment ready for EdgeAI development
- Understand the hardware and software requirements for different deployment scenarios
- Be familiar with the key frameworks and tools used throughout the course
- Be able to select appropriate models based on device constraints and requirements
- Have essential knowledge of optimization techniques for edge deployment