This project demonstrates how an object detection-focused architecture like YOLOv8s can be leveraged to solve the reCAPTCHA v2 reverse Turing test. The model is trained and evaluated on a ~62k-image augmented reCAPTCHA v2 dataset exported from RoboFlow. The best-performing model is exported to ONNX format for browser inference with ONNX Runtime Web.
To achieve hardware-accelerated inference for YOLOv8 in the browser, WebGPU was utilized as the primary execution provider. Unlike its predecessor (WebGL), WebGPU provides direct access to modern GPU features, significantly reducing CPU overhead and enabling performance close to native speeds. However, as of early 2026, WebGPU support on Linux remains behind experimental flags in several browsers due to driver stability requirements.
-
Mozilla Firefox (Linux): Access to the
navigator.gpuAPI is disabled by default. It must be manually enabled viaabout:configby settingdom.webgpu.enabledandgfx.webgpu.ignore-blocklisttotrue. This bypasses the hardware vendor blocklist and exposes the Vulkan-backed WebGPU interface to the browser's JavaScript engine. -
Google Chrome / Chromium (Linux): While support is more mature, WebGPU often requires explicit activation via
chrome://flags(enabling "Unsafe WebGPU Support") or by launching the browser with the--enable-unsafe-webgpuand--enable-features=Vulkancommand-line arguments.
In production environments, a Tiered Execution Strategy is recommended: the application should first attempt to initialize a WebGPU session for maximum performance (FP16), with an automatic fallback to WebAssembly (WASM) with SIMD for compatibility on older browsers or systems with unsupported drivers.
This notebook requires Python ≥ 3.8, PyTorch, and the Ultralytics library. A CUDA-capable NVIDIA GPU is strongly recommended for training (50 epochs on ~62 k images). Below are several ways to set up your environment.
If you want to perform the fine-tuning training by yourself, maybe changing some parameters, you can download my dataset from https://drive.google.com/file/d/1FI9kqsWEkjiaj5A136WhY6oAG9P1r44v/view?usp=drive_link and move it to the root of the project.
You can also use your own dataset.
Ultralytics publishes ready-to-use Docker images with PyTorch, CUDA drivers, and all dependencies pre-installed.
# Pull the GPU image (includes CUDA + cuDNN)
docker pull ultralytics/ultralytics:latest-cuda
# Run the container, mounting your project folder
docker run --ipc=host --gpus all -it \
-v $(pwd):/workspace \
-w /workspace \
ultralytics/ultralytics:latest-cuda
# Inside the container, launch Jupyter
jupyter notebook --ip=0.0.0.0 --port=8888 --no-browser --allow-rootCPU-only variant: replace
latest-cudawithlatestif you don't have an NVIDIA GPU.
The NVIDIA NGC PyTorch image ships with optimised CUDA libraries, NCCL, cuDNN, and TensorRT. It is bigger (~15 GB) but offers best-in-class GPU performance.
# Pull the latest NGC PyTorch image
docker pull nvcr.io/nvidia/pytorch:latest
# Launch the container
docker run --ipc=host --gpus all -it \
-v $(pwd):/workspace \
-w /workspace \
nvcr.io/nvidia/pytorch:latest
# Inside the container install Ultralytics on top
pip install ultralytics onnx onnxruntime
# Then start Jupyter
jupyter notebook --ip=0.0.0.0 --port=8888 --no-browser --allow-rootIf you prefer a native (non-Docker) setup, Conda is a great way to manage a reproducible Python environment with GPU-enabled PyTorch.
# 1. Install Miniconda (skip if you already have it)
# https://docs.anaconda.com/miniconda/install/
# 2. Create and activate a new environment
conda create -n yolov8-recaptcha python=3.10 -y
conda activate yolov8-recaptcha
# 3. Install PyTorch with CUDA support (adjust cuda version as needed)
# See https://pytorch.org/get-started/locally/ for the right command
conda install pytorch torchvision pytorch-cuda=12.4 -c pytorch -c nvidia -y
# 4. Install Ultralytics and notebook dependencies
pip install ultralytics onnx onnxruntime jupyter
# 5. Register the environment as a Jupyter kernel
python -m ipykernel install --user --name yolov8-recaptcha
# 6. Launch Jupyter
jupyter notebookA lightweight alternative using only the Python standard library. You must have the correct NVIDIA drivers and CUDA toolkit already installed on your system for GPU support.
# 1. Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate # Linux / macOS
# .venv\Scripts\activate # Windows
# 2. Upgrade pip
pip install --upgrade pip
# 3. Install PyTorch with CUDA (adjust the index-url for your CUDA version)
# See https://pytorch.org/get-started/locally/
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
# 4. Install Ultralytics and notebook dependencies
pip install ultralytics onnx onnxruntime jupyter
# 5. Register the environment as a Jupyter kernel
python -m ipykernel install --user --name yolov8-recaptcha
# 6. Launch Jupyter
jupyter notebook💡 Tip: Whichever method you choose, verify your setup by running
python -c "import torch; print(torch.cuda.is_available())"— it should printTrueif GPU acceleration is available. Instead of using jupyter to create a web server, you can also connect to the container using DevContainers in VSCode and run the jupyter notebook using VSCode (recommended).
To set up and run the browser-based solver demonstration locally, use Python's built-in HTTP server.
Before all, download my already fine-tuned model from https://drive.google.com/file/d/1h8zYhtXaU1kOJrTiTBfi4uwpT5nAIyQt/view?usp=sharing. Move the resulting best.onnx file to web-demo/model/best.onnx so the web application can find it and load it in ONNX format in the browser.
- Open your terminal and navigate to the
web-demofolder containing the static web files:cd web-demo - Start the local development server:
python3 -m http.server 8000
- Open a supported web browser and navigate to:
http://localhost:8000
Depending on your browser, you might need to enable specific WebGPU or WebAssembly flags as outlined in the WebGPU Implementation & Cross-Browser Configuration section.
