SakuraScan – Predictive Analytics Portfolio Project

Project Overview

SakuraScan is a predictive analytics application developed for Farmy & Foods to support early detection of powdery mildew in cherry leaves. The project combines image-based data analysis, machine learning, and an interactive Streamlit dashboard to reduce the time and cost associated with manual crop inspection.

Application Preview

Responsive app/home page

This page provides an overview of the application and adapts to different screen sizes for accessibility.

Leaf Inference

Users can upload an image of a cherry leaf and receive a real-time classification using the trained deep learning model.

Model Validation Performance

This section presents the model’s validation performance, demonstrating accuracy, stability, and reliability on unseen data.

Prediction Page

The prediction page displays the final classification result along with a confidence score to support decision-making.

Explore Dataset

This page enables visual inspection of the dataset, including class distribution and representative samples.

Business Understanding (CRISP-DM)

Cherry plantations operated by Farmy & Foods are affected by powdery mildew, a fungal disease that reduces crop quality. The current manual inspection process is time-consuming and not scalable across thousands of trees.

The goal of this project is to:

Support visual understanding of differences between healthy and infected cherry leaves.
Provide instant predictions using a trained ML model based on leaf images.
Enable scalability and future reuse of the solution for other crops.

Stakeholders & Benefits

Primary stakeholder:

Marianne McGuiney, Head of IT & Innovation, Farmy & Foods

Benefits:

Significant reduction in inspection time (from ~30 minutes to seconds).
Faster application of treatment when mildew is detected.
Foundation for a reusable ML-based inspection system across multiple crops.

Business Requirements

The client wants to visually differentiate healthy cherry leaves from leaves affected by powdery mildew.
The client wants to predict whether a cherry leaf is healthy or infected using an ML system.
The client requires an interactive dashboard that supports both analysis and prediction.

Mapping Business Requirements to ML and Data Visualisation Tasks

Business Requirement	Dashboard Page	ML / Data Task	Rationale
Visual differentiation of healthy vs infected leaves	Explore Dataset	Image exploration, class distribution, sampling	Helps stakeholders understand visual patterns and data quality
Predict leaf health status	Leaf Health Prediction	Image classification using CNN (ResNet18)	Automates disease detection and supports rapid decisions
Scalable decision support	Full Dashboard	Embedded ML pipeline in Streamlit application	Demonstrates real business use of ML predictions

Machine Learning Business Case

Task: Binary image classification (Healthy vs Powdery Mildew)
Learning method: Supervised learning using transfer learning (ResNet18)
Input: RGB images of cherry leaves
Output: Predicted class with confidence score
Success criteria: High validation accuracy and reliable confidence scores suitable for decision support
Relevance to user: Enables instant assessment of leaf health directly in the dashboard
Training data: Image dataset provided by Farmy & Foods
Heuristics & techniques: Image augmentation, frozen backbone, validation split, accuracy monitoring

Data Analysis & Insights

Exploratory Data Analysis (EDA) was conducted to:

Inspect dataset structure and class balance
Visually examine sample images per class
Validate image quality and variability

Dashboard Design

Pages Overview

Explore Dataset
Image counts per class
Interactive selection of sample images
Supports the visual analysis business requirement
Leaf Health Prediction
Image upload widget
ML-based prediction with confidence score
Visual comparison between confidence and user-defined threshold
Supports the predictive business requirement

Machine Learning Pipeline

Data loading from image folders
Image preprocessing and augmentation
Model training using PyTorch in Jupyter Notebook
Model evaluation on validation data
Model persistence and reuse in Streamlit inference pipeline
Deployment optimization supported by AI-assisted code refinement

Future Work

Expand dataset with more environmental variation
Support multi-class disease classification
Integrate mobile camera capture
Retrain model periodically with new data

Deployment

The application is deployed using Heroku and configured for CPU-only execution.

Deployment setup includes:

Custom Procfile for Streamlit
Python version specification via .python-version
setup.sh for Streamlit server configuration
requirements.txt with PyTorch CPU wheels

The deployed application can be accessed via the Heroku app URL. https://sakurascan-0268a4143e07.herokuapp.com/

Note: Due to the use of PyTorch, the deployment package size is relatively large, which may result in slightly longer startup times.

Testing (Manual Bug Check)

The application was manually tested locally and on the deployed Heroku version.

Smoke Test

Verified that all required packages import correctly in the project virtual environment
Confirmed Streamlit starts without runtime errors

Functional Testing

Explore Dataset

Verified class counts display correctly
Tested random and indexed image selection
Confirmed graceful handling when dataset is unavailable in deployment
Note: The deployed Heroku version includes a small reoresentative sample of images for dataset browsing. The full dataset is used locally in the EDA and modelling notebooks. This approach keeps the deployment lightweight and improves startup performance.

Leaf Health Prediction

Tested image upload and rendering
Verified inference runs successfully and displays predicted class and confidence
Confirmed threshold behaviour and confidence visualisation
Verified session state resets when a new image is uploaded

Deployment Testing (Heroku)

Confirmed the deployed app loads successfully
Verified inference page works in production environment

Known Issues and Fixes

During development, several issues were identified and resolved, including:

Environment setup conflicts on Windows PowerShell
Dependency installation issues on Heroku
Large slug size caused by GPU-related packages
Model loading and path resolution errors

All identified issues were resolved before deployment.

References and Acknowledgements

This project was developed using a combination of course materials, official documentation, and external technical resources. The following sources supported different stages of the project:

Dataset images

The image dataset used in this project was obtained from Kaggle and is used strictly for educational purposes. We thank the original dataset creators for contributing to open-source data for machine learning research.

Acknowledgements - AI

ChatGPT for:

Structuring and folder build
Bug fixes and solutions
Swedish translation and explinations
Docstrings
Assistance with README file
Aid in finding the right information on the web
Assistance in developing and debugging image optimization and resizing scripts for deployment efficiency

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.vscode		.vscode
app_pages		app_pages
assets/screenshots		assets/screenshots
data/source_images		data/source_images
data_sample/source_images		data_sample/source_images
notebooks		notebooks
pages		pages
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
.slugignore		.slugignore
Procfile		Procfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.sh		setup.sh

Code-Institute-Submissions/SakuraScan

Folders and files

Latest commit

History

Repository files navigation

SakuraScan – Predictive Analytics Portfolio Project

Project Overview

Application Preview

Responsive app/home page

Leaf Inference

Model Validation Performance

Prediction Page

Explore Dataset

Business Understanding (CRISP-DM)

Stakeholders & Benefits

Primary stakeholder:

Benefits:

Business Requirements

Mapping Business Requirements to ML and Data Visualisation Tasks

Machine Learning Business Case

Data Analysis & Insights

Dashboard Design

Pages Overview

Explore Dataset

Leaf Health Prediction

Machine Learning Pipeline

Future Work

Deployment

Testing (Manual Bug Check)

Smoke Test

Functional Testing

Deployment Testing (Heroku)

Known Issues and Fixes

References and Acknowledgements

Dataset images

Dataset Preparation

Jupyter Notebook Workflow

Model Training with PyTorch

Streamlit Development

Acknowledgements - AI

ChatGPT for:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages