Skip to content

Releases: unclecode/crawl4ai

Release v0.8.5

18 Mar 03:34

Choose a tag to compare

🎉 Crawl4AI v0.8.5 Released!

📦 Installation

PyPI:

pip install crawl4ai==0.8.5

Docker:

docker pull unclecode/crawl4ai:0.8.5
docker pull unclecode/crawl4ai:latest

Note: Docker images are being built and will be available shortly.
Check the Docker Release workflow for build status.

📝 What's Changed

See CHANGELOG.md for details.

Release v0.8.0

16 Jan 10:40

Choose a tag to compare

🎉 Crawl4AI v0.8.0 Released!

📦 Installation

PyPI:

pip install crawl4ai==0.8.0

Docker:

docker pull unclecode/crawl4ai:0.8.0
docker pull unclecode/crawl4ai:latest

Note: Docker images are being built and will be available shortly.
Check the Docker Release workflow for build status.

📝 What's Changed

See CHANGELOG.md for details.

Release v0.7.8

09 Dec 08:49
60d6173

Choose a tag to compare

🎉 Crawl4AI v0.7.8 Released!

📦 Installation

PyPI:

pip install crawl4ai==0.7.8

Docker:

docker pull unclecode/crawl4ai:0.7.8
docker pull unclecode/crawl4ai:latest

Note: Docker images are being built and will be available shortly.
Check the Docker Release workflow for build status.

📝 What's Changed

See CHANGELOG.md for details.

Release v0.7.7

14 Nov 09:28

Choose a tag to compare

🎉 Crawl4AI v0.7.7 Released!

This release introduces a complete self-hosting platform with enterprise-grade real-time monitoring. This release transforms Crawl4AI Docker from a simple containerized crawler into a production-ready platform with full operational transparency and control.

🚀 What's New

Major Feature: Real-time Monitoring & Self-Hosting Platform

Docker deployment now includes:

  • 📊 Interactive Monitoring Dashboard (/dashboard)
  • 🔌 Comprehensive Monitor API
  • ⚡ WebSocket Streaming
  • 🔥 Smart Browser Pool (3-tier architecture)
  • 🧹 Janitor System
  • 📈 Production-Ready

🐛 Critical Bug Fixes

  • Fixed async LLM extraction blocking issue (#1055) - now supports true parallel processing
  • Fixed CDP endpoint verification with exponential backoff (#1445)
  • Fixed arun_many to always return a list, even on exception

Configuration & Features

  • Updated browser and crawler config documentation to match implementation
  • Enhanced DFS deep crawl strategy with seen URL tracking
  • Fixed sitemap parsing and URL normalization in AsyncUrlSeeder (#1559)
  • Fixed viewport configuration in managed browsers (#1490)
  • Fixed remove_overlay_elements functionality (#1396)

Docker & Infrastructure

  • Fixed LLM API key handling for multi-provider support
  • Standardized Docker port to 11235 across all configs
  • Improved error handling with comprehensive status codes
  • Fixed fit_html serialization in /crawl and /crawl/stream endpoints

Security

  • Updated pyOpenSSL from >=24.3.0 to >=25.3.0 (security vulnerability fix)
  • Added verification tests for security updates

📦 Installation

PyPI:

pip install crawl4ai==0.7.7

Docker:

docker pull unclecode/crawl4ai:0.7.7
docker pull unclecode/crawl4ai:latest

Note: Docker images are being built and will be available shortly.
Check the Docker Release workflow for build status.

📝 What's Changed

See CHANGELOG.md for details.

Release v0.7.6

22 Oct 12:06

Choose a tag to compare

🎉 Crawl4AI v0.7.6 Released!

Crawl4AI v0.7.6 - Webhook Support for Docker Job Queue API

Users can now:

  • Use webhooks with both /crawl/job and /llm/job endpoints
  • Get real-time notifications instead of polling
  • Configure webhook delivery with custom headers
  • Include full data in webhook payloads
  • Set global webhook URLs in config.yml
  • Benefit from automatic retry with exponential backoff

📦 Installation

PyPI:

pip install crawl4ai==0.7.6

Docker:

docker pull unclecode/crawl4ai:0.7.6
docker pull unclecode/crawl4ai:latest

Note: Docker images are being built and will be available shortly.
Check the Docker Release workflow for build status.

📝 What's Changed

See CHANGELOG.md for details.

Release v0.7.5

21 Oct 08:15

Choose a tag to compare

🚀 Crawl4AI v0.7.5: Docker Hooks & Security Update

🎯 What's New

🔧 Docker Hooks System

Inject custom Python functions at 8 key pipeline points for authentication, performance optimization, and content processing.

Function-Based API with IDE support:

from crawl4ai import hooks_to_string

async def on_page_context_created(page, context, **kwargs):
    """Block images to speed up crawling"""
    await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
    return page

hooks_code = hooks_to_string({"on_page_context_created": on_page_context_created})

8 Available Hook Points:
on_browser_created, on_page_context_created, before_goto, after_goto, on_user_agent_updated, on_execution_started, before_retrieve_html, before_return_html

🤖 Enhanced LLM Integration

  • Custom temperature parameter for creativity control
  • Multi-provider support (OpenAI, Gemini, custom endpoints)
  • base_url configuration for self-hosted models
  • Improved Docker API integration

🔒 HTTPS Preservation

New preserve_https_for_internal_links option maintains secure protocols throughout crawling — critical for authenticated sessions and security-conscious applications.

🛠️ Major Bug Fixes

  • URL Processing: Fixed '+' sign preservation in query parameters (#1332)
  • JWT Authentication: Resolved Docker JWT validation issues (#1442)
  • Playwright Stealth: Fixed stealth features integration (#1481)
  • Proxy Configuration: Enhanced parsing with new proxy_config structure
  • Memory Management: Fixed leaks in long-running sessions
  • Docker Serialization: Resolved JSON encoding errors (#1419)
  • LLM Providers: Fixed custom provider integration for adaptive crawler (#1291)
  • Performance: Resolved backoff strategy failures (#989)

📦 Installation

PyPI:
pip install crawl4ai==0.7.5

Docker:
docker pull unclecode/crawl4ai:0.7.5
docker pull unclecode/crawl4ai:latest

Platforms Supported: Linux/AMD64, Linux/ARM64 (Apple Silicon, AWS Graviton)


⚠️ Breaking Changes

  1. Python 3.10+ Required (upgraded from 3.9)
  2. Proxy Parameter Deprecated - Use new proxy_config structure
  3. New Dependency - cssselect added for better CSS handling

📚 Resources


🙏 Contributors

Thank you to everyone who reported issues, provided feedback, and contributed to this release!

Full Changelog: v0.7.4...v0.7.5

Release v0.7.4

17 Aug 12:12

Choose a tag to compare

🎉 Crawl4AI v0.7.4 Released!

📦 Installation

PyPI:

pip install crawl4ai==0.7.4

Docker:

docker pull unclecode/crawl4ai:0.7.4
docker pull unclecode/crawl4ai:latest

📝 What's Changed

See CHANGELOG.md for details.

Release v0.7.3

09 Aug 12:38

Choose a tag to compare

🚀 Crawl4AI v0.7.3: The Multi-Config Intelligence Update

Welcome to Crawl4AI v0.7.3! This release brings powerful new capabilities for stealth crawling, intelligent URL configuration, memory optimization, and enhanced data extraction. Whether you're dealing with bot-protected sites, mixed content types, or large-scale crawling operations, this update has you covered.

💖 GitHub Sponsors Now Live!

After powering 51,000+ developers and becoming the #1 trending web crawler, we're launching GitHub Sponsors to ensure Crawl4AI stays independent and innovative forever.

🏆 Be a Founding Sponsor (First 50 Only!)

  • 🌱 Believer ($5/mo): Join the movement + sponsors-only Discord
  • 🚀 Builder ($50/mo): Priority support + early feature access
  • 💼 Growing Team ($500/mo): Bi-weekly syncs + optimization help
  • 🏢 Data Infrastructure Partner ($2000/mo): Full partnership + dedicated support

Why sponsor? Own your data pipeline. No API limits. Direct access to the creator.

Become a Sponsor → | See Benefits


🎯 Major Features

🕵️ Undetected Browser Support

Break through sophisticated bot detection systems with our new stealth capabilities:

from crawl4ai import AsyncWebCrawler, BrowserConfig

# Enable stealth mode for undetectable crawling
browser_config = BrowserConfig(
    browser_type="undetected",  # Use undetected Chrome
    headless=True,              # Can run headless with stealth
    extra_args=[
        "--disable-blink-features=AutomationControlled",
        "--disable-web-security"
    ]
)

async with AsyncWebCrawler(config=browser_config) as crawler:
    # Successfully bypass Cloudflare, Akamai, and custom bot detection
    result = await crawler.arun("https://protected-site.com")
    print(f"✅ Bypassed protection! Content: {len(result.markdown)} chars")

What it enables:

  • Access previously blocked corporate sites and databases
  • Gather competitor data from protected sources
  • Monitor pricing on e-commerce sites with anti-bot measures
  • Collect news and social media content despite protection systems

🎨 Multi-URL Configuration System

Apply different crawling strategies to different URL patterns automatically:

from crawl4ai import CrawlerRunConfig

# Define specialized configs for different content types
configs = [
    # Documentation sites - aggressive caching, include links
    CrawlerRunConfig(
        url_matcher=["*docs*", "*documentation*"],
        cache_mode="write",
        markdown_generator_options={"include_links": True}
    ),
    
    # News/blog sites - fresh content, scroll for lazy loading
    CrawlerRunConfig(
        url_matcher=lambda url: 'blog' in url or 'news' in url,
        cache_mode="bypass",
        js_code="window.scrollTo(0, document.body.scrollHeight/2);"
    ),
    
    # API endpoints - structured extraction
    CrawlerRunConfig(
        url_matcher=["*.json", "*api*"],
        extraction_strategy=LLMExtractionStrategy(
            provider="openai/gpt-4o-mini",
            extraction_type="structured"
        )
    ),
    
    # Default fallback for everything else
    CrawlerRunConfig()
]

# Crawl multiple URLs with perfect configurations
results = await crawler.arun_many([
    "https://docs.python.org/3/",      # → Uses documentation config
    "https://blog.python.org/",        # → Uses blog config  
    "https://api.github.com/users",    # → Uses API config
    "https://example.com/"             # → Uses default config
], config=configs)

Perfect for:

  • Mixed content sites (blogs, docs, downloads)
  • Multi-domain crawling with different needs per domain
  • Eliminating complex conditional logic in extraction code
  • Optimizing performance by giving each URL exactly what it needs

🧠 Memory Monitoring & Optimization

Track and optimize memory usage during large-scale operations:

from crawl4ai.memory_utils import MemoryMonitor

# Monitor memory during crawling
monitor = MemoryMonitor()
monitor.start_monitoring()

# Perform memory-intensive operations
results = await crawler.arun_many([
    "https://heavy-js-site.com",
    "https://large-images-site.com", 
    "https://dynamic-content-site.com"
] * 100)  # Large batch

# Get detailed memory report
report = monitor.get_report()
print(f"Peak memory usage: {report['peak_mb']:.1f} MB")
print(f"Memory efficiency: {report['efficiency']:.1f}%")

# Automatic optimization suggestions
if report['peak_mb'] > 1000:  # > 1GB
    print("💡 Consider batch size optimization")
    print("💡 Enable aggressive garbage collection")

Benefits:

  • Prevent memory-related crashes in production services
  • Right-size server resources based on actual usage patterns
  • Identify bottlenecks for performance optimization
  • Plan horizontal scaling based on memory requirements

📊 Enhanced Table Extraction

Direct pandas DataFrame conversion from web tables:

result = await crawler.arun("https://site-with-tables.com")

# New streamlined approach
if result.tables:
    print(f"Found {len(result.tables)} tables")
    
    import pandas as pd
    for i, table in enumerate(result.tables):
        # Instant DataFrame conversion
        df = pd.DataFrame(table['data'])
        print(f"Table {i}: {df.shape[0]} rows × {df.shape[1]} columns")
        print(df.head())
        
        # Rich metadata available
        print(f"Source: {table.get('source_xpath', 'Unknown')}")
        print(f"Headers: {table.get('headers', [])}")

# Old way (now deprecated)
# tables_data = result.media.get('tables', [])  # ❌ Don't use this

Improvements:

  • Faster transition from web data to analysis-ready DataFrames
  • Cleaner integration with data processing pipelines
  • Simplified table extraction for automated reporting
  • Better table structure preservation

🐳 Docker LLM Provider Flexibility

Switch between LLM providers without rebuilding images:

# Option 1: Direct environment variables
docker run -d \
  -e LLM_PROVIDER="groq/llama-3.2-3b-preview" \
  -e GROQ_API_KEY="your-key" \
  -p 11235:11235 \
  unclecode/crawl4ai:0.7.3

# Option 2: Using .llm.env file (recommended for production)
docker run -d \
  --env-file .llm.env \
  -p 11235:11235 \
  unclecode/crawl4ai:0.7.3

Create .llm.env file:

LLM_PROVIDER=openai/gpt-4o-mini
OPENAI_API_KEY=your-openai-key
GROQ_API_KEY=your-groq-key

Override per request when needed:

# Use cheaper models for simple tasks, premium for complex ones
response = requests.post("http://localhost:11235/crawl", json={
    "url": "https://complex-page.com",
    "extraction_strategy": {
        "type": "llm",
        "provider": "openai/gpt-4"  # Override default
    }
})

🔧 Bug Fixes & Improvements

  • URL Matcher Fallback: Resolved edge cases in pattern matching logic
  • Memory Management: Fixed memory leaks in long-running sessions
  • Sitemap Processing: Improved redirect handling in sitemap fetching
  • Table Extraction: Enhanced detection and extraction accuracy
  • Error Handling: Better messages and recovery from network failures

📚 Documentation & Architecture

  • Architecture Refactoring: Moved 2,450+ lines to backup for cleaner codebase
  • Real-World Examples: Added practical use cases with actual URLs
  • Migration Guides: Complete transition from result.media to result.tables
  • Comprehensive Guides: Full documentation for undetected browsers and multi-config

📦 Installation & Upgrade

PyPI Installation

# Fresh install
pip install crawl4ai==0.7.3

# Upgrade from previous version
pip install --upgrade crawl4ai==0.7.3

Docker Images

# Specific version
docker pull unclecode/crawl4ai:0.7.3

# Latest (points to 0.7.3)
docker pull unclecode/crawl4ai:latest

# Version aliases
docker pull unclecode/crawl4ai:0.7    # Minor version
docker pull unclecode/crawl4ai:0      # Major version

Migration Notes

  • result.tables replaces result.media.get('tables')
  • Undetected browser requires browser_type="undetected"
  • Multi-config uses url_matcher parameter in CrawlerRunConfig

🎉 What's Next?

This release sets the foundation for even more advanced features coming in v0.8:

  • AI-powered content understanding
  • Advanced crawling strategies
  • Enhanced data pipeline integrations
  • More stealth and anti-detection capabilities

📝 Complete Documentation


Live Long and import crawl4ai

Crawl4AI continues to evolve with your needs. This release makes it stealthier, smarter, and more scalable. Try the new undetected browser and multi-config features—they're game changers!

- The Crawl4AI Team


📝 This release draft was composed and edited by human but rewritten and finalized by AI. If you notice any mistakes, please raise an issue.

v0.7.2: CI/CD & Dependency Optimization Update

25 Jul 10:19

Choose a tag to compare

🚀 Crawl4AI v0.7.2: CI/CD & Dependency Optimization Update

July 25, 2025 • 3 min read


This release introduces automated CI/CD pipelines for seamless releases and optimizes dependencies for a lighter, more efficient package.

🎯 What's New

🔄 Automated Release Pipeline

  • GitHub Actions CI/CD: Automated PyPI and Docker Hub releases on tag push
  • Multi-platform Docker images: Support for both AMD64 and ARM64 architectures
  • Version consistency checks: Ensures tag, package, and Docker versions align
  • Automated release notes: GitHub releases created automatically

📦 Dependency Optimization

  • Moved sentence-transformers to optional dependencies: Significantly reduces default installation size
  • Lighter Docker images: Optimized Dockerfile for faster builds and smaller images
  • Better dependency management: Core vs. optional dependencies clearly separated

🏗️ CI/CD Pipeline

The new automated release process ensures consistent, reliable releases:

# Trigger releases with a simple tag
git tag v0.7.2
git push origin v0.7.2

# Automatically:
# ✅ Validates version consistency
# ✅ Builds and publishes to PyPI
# ✅ Builds multi-platform Docker images
# ✅ Pushes to Docker Hub with proper tags
# ✅ Creates GitHub release

💾 Lighter Installation

Default installation is now significantly smaller:

# Core installation (smaller, faster)
pip install crawl4ai==0.7.2

# With ML features (includes sentence-transformers)
pip install crawl4ai[transformer]==0.7.2

# Full installation
pip install crawl4ai[all]==0.7.2

🐳 Docker Improvements

Enhanced Docker support with multi-platform images:

# Pull the latest version
docker pull unclecode/crawl4ai:0.7.2
docker pull unclecode/crawl4ai:latest

# Available tags:
# - unclecode/crawl4ai:0.7.2 (specific version)
# - unclecode/crawl4ai:0.7 (minor version)
# - unclecode/crawl4ai:0 (major version)
# - unclecode/crawl4ai:latest

🔧 Technical Details

Dependency Changes

  • sentence-transformers moved from required to optional dependencies
  • Reduces default installation by ~500MB
  • No impact on functionality when transformer features aren't needed

CI/CD Configuration

  • GitHub Actions workflows for automated releases
  • Version validation before publishing
  • Parallel PyPI and Docker Hub deployments
  • Automatic tagging strategy for Docker images

🚀 Installation

pip install crawl4ai==0.7.2

No breaking changes - direct upgrade from v0.7.0 or v0.7.1.


Questions? Issues?

P.S. The new CI/CD pipeline will make future releases faster and more reliable. Thanks for your patience as we improve our release process!

v0.7.1:Update

17 Jul 09:48

Choose a tag to compare

🛠️ Crawl4AI v0.7.1: Minor Cleanup Update

July 17, 2025 • 2 min read


A small maintenance release that removes unused code and improves documentation.

🎯 What's Changed

  • Removed unused StealthConfig from crawl4ai/browser_manager.py
  • Updated documentation with better examples and parameter explanations
  • Fixed virtual scroll configuration examples in docs

🧹 Code Cleanup

Removed unused StealthConfig import and configuration that wasn't being used anywhere in the codebase. The project uses its own custom stealth implementation through JavaScript injection instead.

# Removed unused code:
from playwright_stealth import StealthConfig
stealth_config = StealthConfig(...)  # This was never used

📖 Documentation Updates

  • Fixed adaptive crawling parameter examples
  • Updated session management documentation
  • Corrected virtual scroll configuration examples

🚀 Installation

pip install crawl4ai==0.7.1

No breaking changes - upgrade directly from v0.7.0.


Questions? Issues?