Skip to content

Mildew detection program that allows the user to scan a cherry leaf that quickly can tell if the leaf is infested or not.

Notifications You must be signed in to change notification settings

LadyNeowen/SakuraScan

Repository files navigation

SakuraScan – Predictive Analytics Portfolio Project

Project Overview

SakuraScan is a predictive analytics application developed for Farmy & Foods to support early detection of powdery mildew in cherry leaves. The project combines image-based data analysis, machine learning, and an interactive Streamlit dashboard to reduce the time and cost associated with manual crop inspection.


Application Preview

Responsive app/home page

Home Page with responsiveness

This page provides an overview of the application and adapts to different screen sizes for accessibility.


Leaf Inference

Leaf Inference

Users can upload an image of a cherry leaf and receive a real-time classification using the trained deep learning model.


Model Validation Performance

Validation Performance

This section presents the model’s validation performance, demonstrating accuracy, stability, and reliability on unseen data.

Prediction Page

Prediction

The prediction page displays the final classification result along with a confidence score to support decision-making.

Explore Dataset

Explore Dataset

This page enables visual inspection of the dataset, including class distribution and representative samples.


Business Understanding (CRISP-DM)

Cherry plantations operated by Farmy & Foods are affected by powdery mildew, a fungal disease that reduces crop quality. The current manual inspection process is time-consuming and not scalable across thousands of trees.

The goal of this project is to:

  • Support visual understanding of differences between healthy and infected cherry leaves.

  • Provide instant predictions using a trained ML model based on leaf images.

  • Enable scalability and future reuse of the solution for other crops.


Stakeholders & Benefits

Primary stakeholder:

  • Marianne McGuiney, Head of IT & Innovation, Farmy & Foods

Benefits:

  • Significant reduction in inspection time (from ~30 minutes to seconds).

  • Faster application of treatment when mildew is detected.

  • Foundation for a reusable ML-based inspection system across multiple crops.


Business Requirements

  • The client wants to visually differentiate healthy cherry leaves from leaves affected by powdery mildew.

  • The client wants to predict whether a cherry leaf is healthy or infected using an ML system.

  • The client requires an interactive dashboard that supports both analysis and prediction.


Mapping Business Requirements to ML and Data Visualisation Tasks

Business Requirement Dashboard Page ML / Data Task Rationale
Visual differentiation of healthy vs infected leaves Explore Dataset Image exploration, class distribution, sampling Helps stakeholders understand visual patterns and data quality
Predict leaf health status Leaf Health Prediction Image classification using CNN (ResNet18) Automates disease detection and supports rapid decisions
Scalable decision support Full Dashboard Embedded ML pipeline in Streamlit application Demonstrates real business use of ML predictions

Machine Learning Business Case

  • Task: Binary image classification (Healthy vs Powdery Mildew)

  • Learning method: Supervised learning using transfer learning (ResNet18)

  • Input: RGB images of cherry leaves

  • Output: Predicted class with confidence score

  • Success criteria: High validation accuracy and reliable confidence scores suitable for decision support

  • Relevance to user: Enables instant assessment of leaf health directly in the dashboard

  • Training data: Image dataset provided by Farmy & Foods

  • Heuristics & techniques: Image augmentation, frozen backbone, validation split, accuracy monitoring


Data Analysis & Insights

Exploratory Data Analysis (EDA) was conducted to:

  • Inspect dataset structure and class balance

  • Visually examine sample images per class

  • Validate image quality and variability


Dashboard Design

Pages Overview

  • Explore Dataset

  • Image counts per class

  • Interactive selection of sample images

  • Supports the visual analysis business requirement

  • Leaf Health Prediction

  • Image upload widget

  • ML-based prediction with confidence score

  • Visual comparison between confidence and user-defined threshold

  • Supports the predictive business requirement


Machine Learning Pipeline

  • Data loading from image folders

  • Image preprocessing and augmentation

  • Model training using PyTorch in Jupyter Notebook

  • Model evaluation on validation data

  • Model persistence and reuse in Streamlit inference pipeline

  • Deployment optimization supported by AI-assisted code refinement


Future Work

  • Expand dataset with more environmental variation
  • Support multi-class disease classification
  • Integrate mobile camera capture
  • Retrain model periodically with new data

Deployment

The application is deployed using Heroku and configured for CPU-only execution.

Deployment setup includes:

  • Custom Procfile for Streamlit
  • Python version specification via .python-version
  • setup.sh for Streamlit server configuration
  • requirements.txt with PyTorch CPU wheels

The deployed application can be accessed via the Heroku app URL. https://sakurascan-0268a4143e07.herokuapp.com/

Note: Due to the use of PyTorch, the deployment package size is relatively large, which may result in slightly longer startup times.


Testing (Manual Bug Check)

The application was manually tested locally and on the deployed Heroku version.

Smoke Test

  • Verified that all required packages import correctly in the project virtual environment
  • Confirmed Streamlit starts without runtime errors

Functional Testing

Explore Dataset

  • Verified class counts display correctly
  • Tested random and indexed image selection
  • Confirmed graceful handling when dataset is unavailable in deployment
  • Note: The deployed Heroku version includes a small reoresentative sample of images for dataset browsing. The full dataset is used locally in the EDA and modelling notebooks. This approach keeps the deployment lightweight and improves startup performance.

Leaf Health Prediction

  • Tested image upload and rendering
  • Verified inference runs successfully and displays predicted class and confidence
  • Confirmed threshold behaviour and confidence visualisation
  • Verified session state resets when a new image is uploaded

Deployment Testing (Heroku)

  • Confirmed the deployed app loads successfully
  • Verified inference page works in production environment

Known Issues and Fixes

During development, several issues were identified and resolved, including:

  • Environment setup conflicts on Windows PowerShell
  • Dependency installation issues on Heroku
  • Large slug size caused by GPU-related packages
  • Model loading and path resolution errors

All identified issues were resolved before deployment.


References and Acknowledgements

This project was developed using a combination of course materials, official documentation, and external technical resources. The following sources supported different stages of the project:

Dataset images

The image dataset used in this project was obtained from Kaggle and is used strictly for educational purposes. We thank the original dataset creators for contributing to open-source data for machine learning research.

Dataset Preparation

Jupyter Notebook Workflow

Model Training with PyTorch

Streamlit Development


Acknowledgements - AI

ChatGPT for:

  • Structuring and folder build
  • Bug fixes and solutions
  • Swedish translation and explinations
  • Docstrings
  • Assistance with README file
  • Aid in finding the right information on the web
  • Assistance in developing and debugging image optimization and resizing scripts for deployment efficiency

About

Mildew detection program that allows the user to scan a cherry leaf that quickly can tell if the leaf is infested or not.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages