A comprehensive deep learning project for automated classification of diabetic retinopathy stages from retinal fundus images. This research implements and compares multiple CNN architectures, transfer learning approaches, regularization techniques, and ensemble methods to achieve robust medical image classification.
- Overview
- Dataset
- Project Structure
- Installation
- Usage
- Models & Results
- Key Findings
- Documentation
- Citation
Diabetic retinopathy (DR) is a leading cause of vision loss among adults with diabetes. This project develops an AI-powered diagnostic system capable of classifying DR severity into 5 stages:
- Class 0: No DR (Healthy)
- Class 1: Mild Non-Proliferative DR
- Class 2: Moderate Non-Proliferative DR
- Class 3: Severe Non-Proliferative DR
- Class 4: Proliferative DR
✅ Custom CNN Architecture: Baseline model built from scratch
✅ Transfer Learning: EfficientNet-B0 & ResNet-50 with fine-tuning strategies
✅ Advanced Regularization: Dropout, Weight Decay, Label Smoothing, Data Augmentation
✅ Ensemble Learning: Soft voting across 5 best models
✅ Unsupervised Analysis: Convolutional Autoencoder for anomaly detection & latent space visualization
✅ Comprehensive Evaluation: ROC-AUC, Confusion Matrix, Learning Curves, t-SNE visualization
Source: Kaggle - Diabetic Retinopathy Detection
- Total Images: 2,750
- Image Size: 256×256 pixels (RGB)
- Classes: 5 (staged severity levels)
- Split: 70% train / 15% validation / 15% test
| Class | Label | Count |
|---|---|---|
| 0 | Healthy | ~550 |
| 1 | Mild DR | ~450 |
| 2 | Moderate DR | ~600 |
| 3 | Severe DR | ~400 |
| 4 | Proliferative DR | ~500 |
Note: Dataset not included in repository. Download from Kaggle and place in
data/directory.
DiabeticRetinopathy(CVproj)/
├── README.md
├── requirements.txt
├── checkpoints/ # Saved model weights
│
├── data/ # Dataset (not included)
│ ├── Healthy/
│ ├── Mild DR/
│ ├── Moderate DR/
│ ├── Severe DR/
│ └── Proliferate DR/
├── docs/ # Research documentation (PDF report)
├── exploration/ # Jupyter notebooks for experiments and visualization
│ ├── baseline.ipynb
│ ├── transfer.ipynb
│ ├── regularization.ipynb
│ └── autoencoder.ipynb
├── models/ # Model architectures
│ ├── __init__.py
│ ├── baseline_cnn.py
│ ├── transfer_models.py
│ ├── ensemble.py
│ └── autoencoder.py
├── scripts/ # Training scripts
│ ├── train_baseline.py
│ ├── train_transfer.py
│ ├── train_ensemble.py
│ └── train_autoencoder.py
├── training/ # Training utilities
│ ├── config.py
│ ├── data_processor.py
│ ├── train_pipeline.py
│ ├── early_stopping.py
│ └── check_gpu.py
└── results/ # Output metrics
- Python 3.8+
- CUDA-capable GPU (recommended)
- 8GB+ RAM
# Clone repository
git clone https://github.com/deyme17/Diabetic-Retinopathy-CV.git
cd Diabetic-Retinopathy-CV
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Verify GPU availability
python training/check_gpu.py- Download from Kaggle
- Extract to
data/directory maintaining folder structure
python scripts/train_baseline.pypython scripts/train_transfer.py --model "efficientnet_b0" --freeze_until 2python scripts/train_transfer.py --model "resnet50" --augmentation 1python scripts/train_ensemble.py --models "checkpoint1.pth" "checkpoint2.pth"python scripts/train_autoencoder.py --epochs 50Explore experiments interactively:
jupyter notebook exploration/baseline.ipynb- Custom CNN developmenttransfer.ipynb- Transfer learning experimentsregularization.ipynb- Dropout, weight decay, augmentation analysisautoencoder.ipynb- Unsupervised learning & t-SNE visualization
| Model | Val Accuracy | Test Accuracy | Parameters | Training Time |
|---|---|---|---|---|
| Baseline CNN | 64.00% | 63.55% | ~3.67M | 21 epochs |
| EfficientNet-B0 (freeze_until=2) | 72.00% | 69.57% | ~4.0M | 9 epochs |
| ResNet-50 (regularized) | 62.00% | 65.66% | ~23.5M | 17 epochs |
| Ensemble (Soft Voting) | 70.00% | 70.57% | Combined | N/A |
- Accuracy: 70.57%
- Precision: 0.6537
- Recall: 0.6557
- F1-Score: 0.6559
| Class | AUC |
|---|---|
| Healthy | 0.99 |
| Mild DR | 0.92 |
| Moderate DR | 0.82 |
| Severe DR | 0.92 |
| Proliferate DR | 0.90 |
- Anomalies Detected: 17/299 test images
- Detection Threshold: μ + 2.3σ (MSE-based)
- Latent Space Dimensionality: 16×16×256
- EfficientNet-B0 achieved +6% accuracy over baseline CNN
- Compact architecture (4M params) outperformed ResNet-50 (23.5M params)
- Partial freezing (freeze_until=2) optimal for medical imaging domain adaptation
- Dropout (0.5) + Data Augmentation extended training epochs
- Weight Decay (1e-4) reduced validation loss
- Label Smoothing proved ineffective (decreased accuracy by 2-3%)
- Soft Voting increased accuracy by +1% while improving stability
- Critical for Proliferate DR detection
- Reduced false negatives in high-risk classes
- t-SNE visualization revealed continuum between DR stages (no clear cluster boundaries)
- Healthy class forms distinct cluster, but Mild/Moderate DR overlap significantly
- Autoencoder successfully identified low-quality images and severe anomalies
- Class imbalance required weighted loss functions
- Moderate DR (Class 2) most difficult to classify (often confused with Mild/Severe)
- Medical domain gap: ImageNet pretraining requires careful fine-tuning
Full research report available in docs/DiabeticRetinopathyReport.pdf (Ukrainian)
- Problem Statement & Dataset Analysis
- Theoretical Background (CNNs, Transfer Learning, Autoencoders)
- Baseline Model Development
- Transfer Learning Experiments
- Regularization Techniques
- Ensemble Methods
- Autoencoder Analysis
- Comparative Results & Conclusions