SakuraScan is a predictive analytics application developed for Farmy & Foods to support early detection of powdery mildew in cherry leaves. The project combines image-based data analysis, machine learning, and an interactive Streamlit dashboard to reduce the time and cost associated with manual crop inspection.
This page provides an overview of the application and adapts to different screen sizes for accessibility.
Users can upload an image of a cherry leaf and receive a real-time classification using the trained deep learning model.
This section presents the model’s validation performance, demonstrating accuracy, stability, and reliability on unseen data.
The prediction page displays the final classification result along with a confidence score to support decision-making.
This page enables visual inspection of the dataset, including class distribution and representative samples.
Cherry plantations operated by Farmy & Foods are affected by powdery mildew, a fungal disease that reduces crop quality. The current manual inspection process is time-consuming and not scalable across thousands of trees.
The goal of this project is to:
-
Support visual understanding of differences between healthy and infected cherry leaves.
-
Provide instant predictions using a trained ML model based on leaf images.
-
Enable scalability and future reuse of the solution for other crops.
- Marianne McGuiney, Head of IT & Innovation, Farmy & Foods
-
Significant reduction in inspection time (from ~30 minutes to seconds).
-
Faster application of treatment when mildew is detected.
-
Foundation for a reusable ML-based inspection system across multiple crops.
-
The client wants to visually differentiate healthy cherry leaves from leaves affected by powdery mildew.
-
The client wants to predict whether a cherry leaf is healthy or infected using an ML system.
-
The client requires an interactive dashboard that supports both analysis and prediction.
| Business Requirement | Dashboard Page | ML / Data Task | Rationale |
|---|---|---|---|
| Visual differentiation of healthy vs infected leaves | Explore Dataset | Image exploration, class distribution, sampling | Helps stakeholders understand visual patterns and data quality |
| Predict leaf health status | Leaf Health Prediction | Image classification using CNN (ResNet18) | Automates disease detection and supports rapid decisions |
| Scalable decision support | Full Dashboard | Embedded ML pipeline in Streamlit application | Demonstrates real business use of ML predictions |
-
Task: Binary image classification (Healthy vs Powdery Mildew)
-
Learning method: Supervised learning using transfer learning (ResNet18)
-
Input: RGB images of cherry leaves
-
Output: Predicted class with confidence score
-
Success criteria: High validation accuracy and reliable confidence scores suitable for decision support
-
Relevance to user: Enables instant assessment of leaf health directly in the dashboard
-
Training data: Image dataset provided by Farmy & Foods
-
Heuristics & techniques: Image augmentation, frozen backbone, validation split, accuracy monitoring
Exploratory Data Analysis (EDA) was conducted to:
-
Inspect dataset structure and class balance
-
Visually examine sample images per class
-
Validate image quality and variability
-
Image counts per class
-
Interactive selection of sample images
-
Supports the visual analysis business requirement
-
Image upload widget
-
ML-based prediction with confidence score
-
Visual comparison between confidence and user-defined threshold
-
Supports the predictive business requirement
-
Data loading from image folders
-
Image preprocessing and augmentation
-
Model training using PyTorch in Jupyter Notebook
-
Model evaluation on validation data
-
Model persistence and reuse in Streamlit inference pipeline
-
Deployment optimization supported by AI-assisted code refinement
- Expand dataset with more environmental variation
- Support multi-class disease classification
- Integrate mobile camera capture
- Retrain model periodically with new data
The application is deployed using Heroku and configured for CPU-only execution.
Deployment setup includes:
- Custom Procfile for Streamlit
- Python version specification via .python-version
- setup.sh for Streamlit server configuration
- requirements.txt with PyTorch CPU wheels
The deployed application can be accessed via the Heroku app URL. https://sakurascan-0268a4143e07.herokuapp.com/
Note: Due to the use of PyTorch, the deployment package size is relatively large, which may result in slightly longer startup times.
The application was manually tested locally and on the deployed Heroku version.
- Verified that all required packages import correctly in the project virtual environment
- Confirmed Streamlit starts without runtime errors
Explore Dataset
- Verified class counts display correctly
- Tested random and indexed image selection
- Confirmed graceful handling when dataset is unavailable in deployment
- Note: The deployed Heroku version includes a small reoresentative sample of images for dataset browsing. The full dataset is used locally in the EDA and modelling notebooks. This approach keeps the deployment lightweight and improves startup performance.
Leaf Health Prediction
- Tested image upload and rendering
- Verified inference runs successfully and displays predicted class and confidence
- Confirmed threshold behaviour and confidence visualisation
- Verified session state resets when a new image is uploaded
- Confirmed the deployed app loads successfully
- Verified inference page works in production environment
During development, several issues were identified and resolved, including:
- Environment setup conflicts on Windows PowerShell
- Dependency installation issues on Heroku
- Large slug size caused by GPU-related packages
- Model loading and path resolution errors
All identified issues were resolved before deployment.
This project was developed using a combination of course materials, official documentation, and external technical resources. The following sources supported different stages of the project:
The image dataset used in this project was obtained from Kaggle and is used strictly for educational purposes. We thank the original dataset creators for contributing to open-source data for machine learning research.
-
GeeksforGeeks: How to Create a Dataset
https://www.geeksforgeeks.org/data-science/how-to-create-a-dataset/ -
Unidata: How to Prepare an ML Dataset
https://unidata.pro/blog/how-to-prepare-ml-dataset/ -
Dataset Builder (GitHub)
https://github.com/karan3691/dataset-builder -
Stack Overflow: Loading Images from Folders
https://stackoverflow.com/questions/56848253/
- AWS Sample Image Classification Notebook
https://github.com/aws-samples/aws-sar-sagemaker-image-classification
-
PyTorch CIFAR-10 Tutorial
https://docs.pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html -
PyTorch Transfer Learning Tutorial
https://docs.pytorch.org/tutorials/beginner/transfer_learning_tutorial.html -
PyTorch Documentation
https://docs.pytorch.org/docs/stable/index.html -
Torchvision Models
https://docs.pytorch.org/vision/stable/models.html -
PyTorch ImageNet Examples
https://github.com/pytorch/examples/tree/main/imagenet
-
File Uploader API
https://docs.streamlit.io/develop/api-reference/widgets/st.file_uploader -
Caching and State
https://docs.streamlit.io/develop/api-reference/caching-and-state/st.cache_resource -
Multipage Applications
https://docs.streamlit.io/develop/concepts/multipage-apps -
Slider Widget
https://docs.streamlit.io/develop/api-reference/widgets/st.slider
- Structuring and folder build
- Bug fixes and solutions
- Swedish translation and explinations
- Docstrings
- Assistance with README file
- Aid in finding the right information on the web
- Assistance in developing and debugging image optimization and resizing scripts for deployment efficiency




