Skip to content

Latest commit

 

History

History
567 lines (459 loc) · 17.3 KB

File metadata and controls

567 lines (459 loc) · 17.3 KB

ArcFace PyTorch Implementation

PyTorch implementation of ArcFace (Additive Angular Margin Loss) for face recognition with complete training and evaluation pipeline.

English | 简体中文

🎯 Results

Our trained model achieves 86% accuracy on LFW face verification using:

  • 100 person classes, 1,624 training images (carefully balanced subset)
  • ResNet-18 backbone (lightweight, 97MB model)
  • ArcFace + Focal Loss training
  • 2 hours training time on CPU
┌─────────────────────────────────────┐
│      LFW Verification Results       │
├─────────────────────────────────────┤
│ Accuracy:       86.00%              │
│ Test Pairs:     6,000 pairs         │
│ Threshold:      0.1254              │
│ Margin:         0.3086 (Excellent)  │
│                                     │
│ Same Person:    0.31 ± 0.20         │
│ Different:      0.01 ± 0.10         │
└─────────────────────────────────────┘

Features

  • Complete Pipeline: Data selection → Training → Evaluation
  • Balanced Dataset: Intelligent sampling from LFW (prevents class imbalance)
  • Multiple Architectures: ResNet-18/34/50 with optional SE blocks
  • Advanced Losses: ArcFace, CosFace, Focal Loss, Softmax
  • LFW Evaluation: Standard face verification protocol with 6,000 pairs
  • Detailed Reports: Comprehensive analysis of model performance

Installation

# Clone repository
git clone https://github.com/your-repo/arcface-pytorch.git
cd arcface-pytorch

# Install dependencies
pip install -r requirements.txt

Quick Start

1. Prepare Data

# Download and extract LFW dataset to data/datasets/lfw/lfw-align-128/

# Create training data list (automatically selects balanced subset)
python create_train_list.py

2. Training

# Train with default configuration (ResNet-18 + ArcFace + Focal Loss)
python train.py

3. Evaluation

# Quick validation on LFW pairs
python validate_model.py

# Detailed evaluation with feature analysis
python evaluate_model.py

Configuration

Edit config.yaml to customize training parameters:

model:
  backbone: 'resnet18'      # resnet18/resnet34/resnet50
  metric: 'arc_margin'      # ArcFace loss (s=30, m=0.5)
  use_se: false             # Squeeze-Excitation blocks
  num_classes: 100

loss:
  type: 'focal_loss'        # focal_loss or cross_entropy
  gamma: 2                  # Focal loss gamma parameter

training:
  batch_size: 32            # Larger = faster but more memory
  max_epoch: 100
  lr: 0.001                 # Initial learning rate
  lr_step: 20               # Decay every N epochs
  optimizer: 'sgd'          # SGD with momentum=0.9
  weight_decay: 0.0005

paths:
  checkpoints_path: 'checkpoints'
  test_model_path: 'checkpoints/resnet18_best.pth'

Project Structure

arcface-pytorch/
├── config/
│   └── yaml_config.py         # YAML configuration loader
├── data/
│   └── dataset.py             # Data loading and preprocessing
├── models/
│   ├── resnet.py             # ResNet-18/34/50 architectures
│   ├── metrics.py            # ArcFace/CosFace metrics
│   └── focal_loss.py         # Focal loss implementation
│
├── config.yaml               # Training configuration
├── train_list.txt            # Training data list (1,624 images)
├── lfw_test_pair.txt         # LFW verification pairs (6,000 pairs)
│
├── create_train_list.py      # Generate balanced training data
├── train.py                  # Training script (ArcFace + Focal Loss)
├── validate_model.py         # Quick LFW validation
├── evaluate_model.py         # Detailed evaluation and feature analysis
│
├── checkpoints/              # Model checkpoints
│   ├── resnet18_best.pth    # Best model (86% LFW accuracy)
│   └── resnet18_epoch*.pth  # Periodic checkpoints
│
└── validation_report.md      # Detailed performance report

Dataset Preparation

Training Data Selection

We carefully select a training subset from the LFW (Labeled Faces in the Wild) dataset:

Data Selection Criteria:

  1. Class Selection: Choose persons with ≥10 images (158 persons qualify)
  2. Balanced Sampling: Limit each class to maximum 20 images (prevents class imbalance)
  3. Final Dataset: Select top 100 classes → 1,624 total training images
  4. Data Distribution:
    • 43 classes with 20 images
    • 57 classes with 10-19 images
    • Average: 16.2 images per class

Why This Approach?

  • Balanced: Avoids overfitting to majority classes (e.g., George_W_Bush has 530 images in raw LFW)
  • Sufficient: 100 classes × 16 images provides adequate training signal
  • Realistic: Simulates real-world scenarios with limited samples per person
  • Fast Training: Smaller dataset enables rapid experimentation (2 hours on CPU)

Generate Training List:

python create_train_list.py

This script:

  • Scans LFW directory and counts images per person
  • Selects top 100 persons with most images (≥10 images each)
  • Samples up to 20 images per person
  • Shuffles data and writes to train_list.txt in format: person_name/image.jpg label

LFW Test Dataset

  1. Download LFW dataset
  2. Align faces to 128×128
  3. The lfw_test_pair.txt contains 6,000 pairs:
    • 3,000 positive pairs (same person)
    • 3,000 negative pairs (different persons)

Training Pipeline

How Training Works

The training process combines ArcFace metric learning with Focal Loss for robust face recognition:

1. Model Architecture

Input Image (128×128 grayscale)
    ↓
ResNet-18 Backbone (feature extractor)
    ↓
512-D Feature Vector (L2 normalized)
    ↓
ArcMargin Product (angular margin classifier, s=30, m=0.5)
    ↓
100-D Class Logits
    ↓
Focal Loss (γ=2, down-weights easy examples)

2. Training Configuration (config.yaml):

model:
  backbone: resnet18          # Feature extractor
  metric: arc_margin          # ArcFace (s=30, m=0.5)
  num_classes: 100

loss:
  type: focal_loss            # γ=2 (down-weights easy examples)

training:
  batch_size: 32
  max_epoch: 100
  lr: 0.001                   # Initial learning rate
  lr_step: 20                 # Decay every 20 epochs
  optimizer: sgd              # SGD with momentum=0.9
  weight_decay: 0.0005

3. Training Process (100 epochs, ~2 hours on CPU):

  • Epoch 1-20: Learn basic features (LR=0.001)
  • Epoch 21-40: Refine features (LR=0.0001)
  • Epoch 41-60: Fine-tune margins (LR=0.00001)
  • Epoch 61-80: Polish features (LR=0.000001)
  • Epoch 81-100: Final convergence (LR=0.0000001)

4. Key Training Features:

  • ArcMargin Loss: Adds angular margin in feature space for better class separation

    # Standard Softmax: cos(θ)
    # ArcFace:         cos(θ + m) where m=0.5
    # Effect: Forces intra-class compactness and inter-class separation
  • Focal Loss: Focuses on hard examples

    FL(p) = -(1-p)^γ * log(p)  where γ=2
    # Down-weights easy examples, emphasizes hard negatives
  • Checkpointing: Saves best model and periodic snapshots every 10 epochs

5. Run Training:

# Uses config.yaml by default
python train.py

# Monitor progress (prints every 10 batches):
# Epoch [1/100] Batch [0/50] Loss: 4.2156 Acc: 0.0312
# Epoch [1/100] Batch [10/50] Loss: 3.8934 Acc: 0.1250
# ...
# Epoch 1/100 complete - Time: 81.2s
#   Average Loss: 3.5421
#   Average Acc: 0.2341 (23.41%)

Supported Models

  • ResNet-18: Lightweight, fast training (used in this implementation)
  • ResNet-34: Balanced performance
  • ResNet-50: Best accuracy

Loss Functions

  • ArcFace: Additive angular margin loss (default)
  • CosFace: Cosine margin loss
  • Focal Loss: Handles class imbalance (default)
  • Cross Entropy: Standard classification loss

Evaluation Pipeline

How Model Evaluation Works

The model is evaluated using the LFW (Labeled Faces in the Wild) verification protocol:

1. LFW Verification Protocol:

  • Input: 6,000 face pairs
    • 3,000 positive pairs (same person)
    • 3,000 negative pairs (different persons)
  • Task: Determine if each pair shows the same person
  • Metric: Verification accuracy at optimal threshold

2. Evaluation Process:

For each image pair (img1, img2):
    1. Load and preprocess images (grayscale, normalize)
    2. Apply horizontal flip augmentation (creates img1_flip, img2_flip)
    3. Extract features:
       - feature1 = model(img1)        # 512-D vector
       - feature1_flip = model(img1_flip)
       - feature2 = model(img2)
       - feature2_flip = model(img2_flip)
    4. Combine features:
       - f1 = concat(feature1, feature1_flip)  # 1024-D
       - f2 = concat(feature2, feature2_flip)
    5. Compute cosine similarity:
       - similarity = (f1 · f2) / (||f1|| × ||f2||)
    6. Compare with threshold:
       - same_person = (similarity >= threshold)

Find optimal threshold that maximizes accuracy

3. Evaluation Metrics:

Metric Value Description
LFW Accuracy 86.00% Overall verification accuracy
Optimal Threshold 0.1254 Best cosine similarity cutoff
Discriminative Margin 0.3086 Separation between same/different persons

Similarity Distribution Analysis:

Same Person (Positive Pairs):
  - Mean similarity: 0.3148 ± 0.1988
  - Range: [-0.18, 0.86]
  - Interpretation: Same person's photos have high similarity

Different Persons (Negative Pairs):
  - Mean similarity: 0.0062 ± 0.1017
  - Range: [-0.31, 0.43]
  - Interpretation: Different persons have near-zero similarity

Discriminative Margin: 0.3148 - 0.0062 = 0.3086
  ✓ Excellent feature quality (>0.20 is excellent)

4. Run Evaluation:

# Quick evaluation (6000 pairs, ~2 minutes)
python validate_model.py

# Output:
# [1/4] Loading config and model...
#   ✓ Model loaded successfully
# [2/4] LFW pair verification...
#   Progress: 6000/6000
#   ✓ Processing complete: 6000 valid pairs
# [3/4] Computing accuracy...
#   Best accuracy: 86.00%
#   Best threshold: 0.1254
# [4/4] Similarity distribution analysis...
#   Discriminative margin: 0.3086
#   Feature quality: Excellent
# Detailed evaluation with feature analysis
python evaluate_model.py

5. Understanding the Results:

  • 86% Accuracy: Our lightweight model correctly verifies 5,160 out of 6,000 pairs
  • vs. SOTA: Modern methods achieve 99%+, but use:
    • Larger datasets (millions of images vs. our 1,624)
    • Deeper networks (ResNet-100 vs. our ResNet-18)
    • More training time (days on GPUs vs. 2 hours on CPU)
  • Our Advantage: Fast training, small model (97MB), good for learning/prototyping

6. Error Analysis:

The 14% error rate (840 pairs) typically includes:

  • False Positives (~7%): Different people classified as same
    • Cause: Similar appearance, pose, lighting
  • False Negatives (~7%): Same person classified as different
    • Cause: Large appearance changes (age, expression, lighting)

Performance

Our Implementation

Configuration LFW Accuracy Training Time Model Size
ResNet-18 + ArcFace 86.00% 2 hours (CPU) 97.9 MB
100 classes, 1,624 images Threshold: 0.1254 100 epochs 512-D features

Discriminative Quality:

  • Margin: 0.3086 (Excellent ✓)
  • No overfitting (validated on independent LFW test set)
  • Best model appears at epoch 90

Comparison with SOTA

Model Dataset LFW Accuracy Notes
Traditional (PCA+SVM) LFW ~70% Classical ML baseline
Our ResNet-18 LFW subset 86% Limited data, fast training
DeepFace (2014) 4M images 97.35% Facebook, large dataset
FaceNet (2015) 200M images 99.63% Google, massive data
ArcFace (2019) MS1MV2 99.83% State-of-the-art

Why the gap?

  1. Data: We use 1,624 images vs. millions
  2. Model: ResNet-18 vs. ResNet-100
  3. Resources: CPU training vs. multi-GPU clusters
  4. Purpose: Educational/prototype vs. production system

Usage Examples

Complete Workflow

Step 1: Prepare Training Data

# Generate balanced training list from LFW
python create_train_list.py

# Output:
# [1/5] Scanning LFW dataset
#   Total persons: 5749
#   Qualified persons (>=10 images): 158
# [2/5] Selecting classes
#   Selected classes: 100
# [3/5] Sampling images (max 20 per class)
# [4/5] Shuffling data
#   Total training samples: 1624
# [5/5] Writing to file: train_list.txt
#   ✓ Successfully wrote 1624 records

Step 2: Train Model

# Train with default config (ResNet-18 + ArcFace + Focal Loss)
python train.py

# Training progress:
# ============================================================
# Starting training
# ============================================================
# Loading config from config.yaml...
# Using device: cpu
#
# Loading training data...
# ✓ Training data: 1624 images
# ✓ Batches: 50
#
# Epoch [1/100] Batch [0/50] Loss: 4.2156 Acc: 0.0312
# Epoch [1/100] Batch [10/50] Loss: 3.8934 Acc: 0.1250
# ...
# ============================================================
# Epoch 1/100 complete - Time: 81.2s
#   Average Loss: 3.5421
#   Average Acc: 0.2341 (23.41%)
#   Learning rate: 0.001000
# ============================================================
# ✓ Saved best model: checkpoints/resnet18_best.pth (Acc: 23.41%)

Step 3: Validate Model

# Quick validation on LFW pairs (6000 pairs, ~2 minutes)
python validate_model.py

# Results:
# ================================================================================
# [1/4] Loading config and model...
#   ✓ Model loaded successfully
# [2/4] LFW pair verification...
#   Progress: 6000/6000
#   ✓ Processing complete: 6000 valid pairs
# [3/4] Computing accuracy...
#   Best accuracy: 86.00%
#   Best threshold: 0.1254
# [4/4] Similarity distribution analysis...
#   Same person (positive): 0.3148 ± 0.1988
#   Different persons (negative): 0.0062 ± 0.1017
#   Discriminative margin: 0.3086
#   Feature quality: Excellent
# ================================================================================

Advanced Usage

Monitor Training Progress:

# Watch training log in real-time
tail -f training.log

# Check latest checkpoint
ls -lht checkpoints/ | head -5

Evaluate Specific Checkpoint:

# Modify config.yaml
paths:
  test_model_path: 'checkpoints/resnet18_epoch50.pth'

# Run validation
python validate_model.py

Extract Face Features:

import torch
from models import resnet_face18

# Load model
model = resnet_face18(use_se=False)
model.load_state_dict(torch.load('checkpoints/resnet18_best.pth'))
model.eval()

# Extract features
with torch.no_grad():
    features = model(image_tensor)  # Returns 512-D vector

# Compare faces using cosine similarity
similarity = torch.nn.functional.cosine_similarity(feat1, feat2)
same_person = similarity > 0.1254  # Use validated threshold

Pretrained Models

Download LFW dataset:

Dataset Directory Structure:

After downloading and extracting the LFW dataset, organize files according to the following structure:

arcface-pytorch/
├── data/
│   └── datasets/
│       └── lfw/
│           ├── lfw-align-128/          # LFW aligned dataset (128×128)
│           │   ├── Aaron_Eckhart/      # Person folders
│           │   │   ├── Aaron_Eckhart_0001.jpg
│           │   │   └── ...
│           │   ├── George_W_Bush/
│           │   │   ├── George_W_Bush_0001.jpg
│           │   │   └── ...
│           │   └── ...                 # 5749 person folders
│           └── test/                   # (Optional) test dataset
│               ├── Aaron_Eckhart/
│               └── ...
├── train_list.txt                      # Training list (generated by create_train_list.py)
└── lfw_test_pair.txt                   # LFW test pairs (6000 pairs)

Notes:

  • lfw-align-128/: Contains all LFW dataset, one folder per person with all face images for that person
  • Training scripts read data from lfw-align-128/
  • Ensure the path in config.yaml is set to: train_root: './data/datasets/lfw/lfw-align-128'

Pretrained model reference from CosFace_pytorch:

Troubleshooting

Common Issues

  1. CUDA out of memory: Reduce batch size
  2. Dataset not found: Check paths in config.yaml
  3. Low accuracy: Ensure proper data preprocessing

Requirements

  • Python 3.7+
  • PyTorch 1.8+
  • CUDA 10.2+ (for GPU training)
  • 8GB+ GPU memory recommended

References

License

MIT License