Releases: unclecode/crawl4ai
Release v0.8.5
🎉 Crawl4AI v0.8.5 Released!
📦 Installation
PyPI:
pip install crawl4ai==0.8.5Docker:
docker pull unclecode/crawl4ai:0.8.5
docker pull unclecode/crawl4ai:latestNote: Docker images are being built and will be available shortly.
Check the Docker Release workflow for build status.
📝 What's Changed
See CHANGELOG.md for details.
Release v0.8.0
🎉 Crawl4AI v0.8.0 Released!
📦 Installation
PyPI:
pip install crawl4ai==0.8.0Docker:
docker pull unclecode/crawl4ai:0.8.0
docker pull unclecode/crawl4ai:latestNote: Docker images are being built and will be available shortly.
Check the Docker Release workflow for build status.
📝 What's Changed
See CHANGELOG.md for details.
Release v0.7.8
🎉 Crawl4AI v0.7.8 Released!
📦 Installation
PyPI:
pip install crawl4ai==0.7.8Docker:
docker pull unclecode/crawl4ai:0.7.8
docker pull unclecode/crawl4ai:latestNote: Docker images are being built and will be available shortly.
Check the Docker Release workflow for build status.
📝 What's Changed
See CHANGELOG.md for details.
Release v0.7.7
🎉 Crawl4AI v0.7.7 Released!
This release introduces a complete self-hosting platform with enterprise-grade real-time monitoring. This release transforms Crawl4AI Docker from a simple containerized crawler into a production-ready platform with full operational transparency and control.
🚀 What's New
Major Feature: Real-time Monitoring & Self-Hosting Platform
Docker deployment now includes:
- 📊 Interactive Monitoring Dashboard (/dashboard)
- 🔌 Comprehensive Monitor API
- ⚡ WebSocket Streaming
- 🔥 Smart Browser Pool (3-tier architecture)
- 🧹 Janitor System
- 📈 Production-Ready
🐛 Critical Bug Fixes
- Fixed async LLM extraction blocking issue (#1055) - now supports true parallel processing
- Fixed CDP endpoint verification with exponential backoff (#1445)
- Fixed arun_many to always return a list, even on exception
Configuration & Features
- Updated browser and crawler config documentation to match implementation
- Enhanced DFS deep crawl strategy with seen URL tracking
- Fixed sitemap parsing and URL normalization in AsyncUrlSeeder (#1559)
- Fixed viewport configuration in managed browsers (#1490)
- Fixed remove_overlay_elements functionality (#1396)
Docker & Infrastructure
- Fixed LLM API key handling for multi-provider support
- Standardized Docker port to 11235 across all configs
- Improved error handling with comprehensive status codes
- Fixed fit_html serialization in /crawl and /crawl/stream endpoints
Security
- Updated pyOpenSSL from >=24.3.0 to >=25.3.0 (security vulnerability fix)
- Added verification tests for security updates
📦 Installation
PyPI:
pip install crawl4ai==0.7.7Docker:
docker pull unclecode/crawl4ai:0.7.7
docker pull unclecode/crawl4ai:latestNote: Docker images are being built and will be available shortly.
Check the Docker Release workflow for build status.
📝 What's Changed
See CHANGELOG.md for details.
Release v0.7.6
🎉 Crawl4AI v0.7.6 Released!
Crawl4AI v0.7.6 - Webhook Support for Docker Job Queue API
Users can now:
- Use webhooks with both /crawl/job and /llm/job endpoints
- Get real-time notifications instead of polling
- Configure webhook delivery with custom headers
- Include full data in webhook payloads
- Set global webhook URLs in config.yml
- Benefit from automatic retry with exponential backoff
📦 Installation
PyPI:
pip install crawl4ai==0.7.6Docker:
docker pull unclecode/crawl4ai:0.7.6
docker pull unclecode/crawl4ai:latestNote: Docker images are being built and will be available shortly.
Check the Docker Release workflow for build status.
📝 What's Changed
See CHANGELOG.md for details.
Release v0.7.5
🚀 Crawl4AI v0.7.5: Docker Hooks & Security Update
🎯 What's New
🔧 Docker Hooks System
Inject custom Python functions at 8 key pipeline points for authentication, performance optimization, and content processing.
Function-Based API with IDE support:
from crawl4ai import hooks_to_string
async def on_page_context_created(page, context, **kwargs):
"""Block images to speed up crawling"""
await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
return page
hooks_code = hooks_to_string({"on_page_context_created": on_page_context_created})8 Available Hook Points:
on_browser_created, on_page_context_created, before_goto, after_goto, on_user_agent_updated, on_execution_started, before_retrieve_html, before_return_html
🤖 Enhanced LLM Integration
- Custom temperature parameter for creativity control
- Multi-provider support (OpenAI, Gemini, custom endpoints)
- base_url configuration for self-hosted models
- Improved Docker API integration
🔒 HTTPS Preservation
New preserve_https_for_internal_links option maintains secure protocols throughout crawling — critical for authenticated sessions and security-conscious applications.
🛠️ Major Bug Fixes
- URL Processing: Fixed '+' sign preservation in query parameters (#1332)
- JWT Authentication: Resolved Docker JWT validation issues (#1442)
- Playwright Stealth: Fixed stealth features integration (#1481)
- Proxy Configuration: Enhanced parsing with new proxy_config structure
- Memory Management: Fixed leaks in long-running sessions
- Docker Serialization: Resolved JSON encoding errors (#1419)
- LLM Providers: Fixed custom provider integration for adaptive crawler (#1291)
- Performance: Resolved backoff strategy failures (#989)
📦 Installation
PyPI:
pip install crawl4ai==0.7.5
Docker:
docker pull unclecode/crawl4ai:0.7.5
docker pull unclecode/crawl4ai:latest
Platforms Supported: Linux/AMD64, Linux/ARM64 (Apple Silicon, AWS Graviton)
- Python 3.10+ Required (upgraded from 3.9)
- Proxy Parameter Deprecated - Use new proxy_config structure
- New Dependency - cssselect added for better CSS handling
📚 Resources
- 📖 Full Release Notes: https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.5.md
- 📘 Documentation: https://docs.crawl4ai.com
- 💬 Discord Community: https://discord.gg/jP8KfhDhyN
- 🐛 Issues: https://github.com/unclecode/crawl4ai/issues
🙏 Contributors
Thank you to everyone who reported issues, provided feedback, and contributed to this release!
Full Changelog: v0.7.4...v0.7.5
Release v0.7.4
🎉 Crawl4AI v0.7.4 Released!
📦 Installation
PyPI:
pip install crawl4ai==0.7.4Docker:
docker pull unclecode/crawl4ai:0.7.4
docker pull unclecode/crawl4ai:latest📝 What's Changed
See CHANGELOG.md for details.
Release v0.7.3
🚀 Crawl4AI v0.7.3: The Multi-Config Intelligence Update
Welcome to Crawl4AI v0.7.3! This release brings powerful new capabilities for stealth crawling, intelligent URL configuration, memory optimization, and enhanced data extraction. Whether you're dealing with bot-protected sites, mixed content types, or large-scale crawling operations, this update has you covered.
💖 GitHub Sponsors Now Live!
After powering 51,000+ developers and becoming the #1 trending web crawler, we're launching GitHub Sponsors to ensure Crawl4AI stays independent and innovative forever.
🏆 Be a Founding Sponsor (First 50 Only!)
- 🌱 Believer ($5/mo): Join the movement + sponsors-only Discord
- 🚀 Builder ($50/mo): Priority support + early feature access
- 💼 Growing Team ($500/mo): Bi-weekly syncs + optimization help
- 🏢 Data Infrastructure Partner ($2000/mo): Full partnership + dedicated support
Why sponsor? Own your data pipeline. No API limits. Direct access to the creator.
Become a Sponsor → | See Benefits
🎯 Major Features
🕵️ Undetected Browser Support
Break through sophisticated bot detection systems with our new stealth capabilities:
from crawl4ai import AsyncWebCrawler, BrowserConfig
# Enable stealth mode for undetectable crawling
browser_config = BrowserConfig(
browser_type="undetected", # Use undetected Chrome
headless=True, # Can run headless with stealth
extra_args=[
"--disable-blink-features=AutomationControlled",
"--disable-web-security"
]
)
async with AsyncWebCrawler(config=browser_config) as crawler:
# Successfully bypass Cloudflare, Akamai, and custom bot detection
result = await crawler.arun("https://protected-site.com")
print(f"✅ Bypassed protection! Content: {len(result.markdown)} chars")What it enables:
- Access previously blocked corporate sites and databases
- Gather competitor data from protected sources
- Monitor pricing on e-commerce sites with anti-bot measures
- Collect news and social media content despite protection systems
🎨 Multi-URL Configuration System
Apply different crawling strategies to different URL patterns automatically:
from crawl4ai import CrawlerRunConfig
# Define specialized configs for different content types
configs = [
# Documentation sites - aggressive caching, include links
CrawlerRunConfig(
url_matcher=["*docs*", "*documentation*"],
cache_mode="write",
markdown_generator_options={"include_links": True}
),
# News/blog sites - fresh content, scroll for lazy loading
CrawlerRunConfig(
url_matcher=lambda url: 'blog' in url or 'news' in url,
cache_mode="bypass",
js_code="window.scrollTo(0, document.body.scrollHeight/2);"
),
# API endpoints - structured extraction
CrawlerRunConfig(
url_matcher=["*.json", "*api*"],
extraction_strategy=LLMExtractionStrategy(
provider="openai/gpt-4o-mini",
extraction_type="structured"
)
),
# Default fallback for everything else
CrawlerRunConfig()
]
# Crawl multiple URLs with perfect configurations
results = await crawler.arun_many([
"https://docs.python.org/3/", # → Uses documentation config
"https://blog.python.org/", # → Uses blog config
"https://api.github.com/users", # → Uses API config
"https://example.com/" # → Uses default config
], config=configs)Perfect for:
- Mixed content sites (blogs, docs, downloads)
- Multi-domain crawling with different needs per domain
- Eliminating complex conditional logic in extraction code
- Optimizing performance by giving each URL exactly what it needs
🧠 Memory Monitoring & Optimization
Track and optimize memory usage during large-scale operations:
from crawl4ai.memory_utils import MemoryMonitor
# Monitor memory during crawling
monitor = MemoryMonitor()
monitor.start_monitoring()
# Perform memory-intensive operations
results = await crawler.arun_many([
"https://heavy-js-site.com",
"https://large-images-site.com",
"https://dynamic-content-site.com"
] * 100) # Large batch
# Get detailed memory report
report = monitor.get_report()
print(f"Peak memory usage: {report['peak_mb']:.1f} MB")
print(f"Memory efficiency: {report['efficiency']:.1f}%")
# Automatic optimization suggestions
if report['peak_mb'] > 1000: # > 1GB
print("💡 Consider batch size optimization")
print("💡 Enable aggressive garbage collection")Benefits:
- Prevent memory-related crashes in production services
- Right-size server resources based on actual usage patterns
- Identify bottlenecks for performance optimization
- Plan horizontal scaling based on memory requirements
📊 Enhanced Table Extraction
Direct pandas DataFrame conversion from web tables:
result = await crawler.arun("https://site-with-tables.com")
# New streamlined approach
if result.tables:
print(f"Found {len(result.tables)} tables")
import pandas as pd
for i, table in enumerate(result.tables):
# Instant DataFrame conversion
df = pd.DataFrame(table['data'])
print(f"Table {i}: {df.shape[0]} rows × {df.shape[1]} columns")
print(df.head())
# Rich metadata available
print(f"Source: {table.get('source_xpath', 'Unknown')}")
print(f"Headers: {table.get('headers', [])}")
# Old way (now deprecated)
# tables_data = result.media.get('tables', []) # ❌ Don't use thisImprovements:
- Faster transition from web data to analysis-ready DataFrames
- Cleaner integration with data processing pipelines
- Simplified table extraction for automated reporting
- Better table structure preservation
🐳 Docker LLM Provider Flexibility
Switch between LLM providers without rebuilding images:
# Option 1: Direct environment variables
docker run -d \
-e LLM_PROVIDER="groq/llama-3.2-3b-preview" \
-e GROQ_API_KEY="your-key" \
-p 11235:11235 \
unclecode/crawl4ai:0.7.3
# Option 2: Using .llm.env file (recommended for production)
docker run -d \
--env-file .llm.env \
-p 11235:11235 \
unclecode/crawl4ai:0.7.3Create .llm.env file:
LLM_PROVIDER=openai/gpt-4o-mini
OPENAI_API_KEY=your-openai-key
GROQ_API_KEY=your-groq-keyOverride per request when needed:
# Use cheaper models for simple tasks, premium for complex ones
response = requests.post("http://localhost:11235/crawl", json={
"url": "https://complex-page.com",
"extraction_strategy": {
"type": "llm",
"provider": "openai/gpt-4" # Override default
}
})🔧 Bug Fixes & Improvements
- URL Matcher Fallback: Resolved edge cases in pattern matching logic
- Memory Management: Fixed memory leaks in long-running sessions
- Sitemap Processing: Improved redirect handling in sitemap fetching
- Table Extraction: Enhanced detection and extraction accuracy
- Error Handling: Better messages and recovery from network failures
📚 Documentation & Architecture
- Architecture Refactoring: Moved 2,450+ lines to backup for cleaner codebase
- Real-World Examples: Added practical use cases with actual URLs
- Migration Guides: Complete transition from
result.mediatoresult.tables - Comprehensive Guides: Full documentation for undetected browsers and multi-config
📦 Installation & Upgrade
PyPI Installation
# Fresh install
pip install crawl4ai==0.7.3
# Upgrade from previous version
pip install --upgrade crawl4ai==0.7.3Docker Images
# Specific version
docker pull unclecode/crawl4ai:0.7.3
# Latest (points to 0.7.3)
docker pull unclecode/crawl4ai:latest
# Version aliases
docker pull unclecode/crawl4ai:0.7 # Minor version
docker pull unclecode/crawl4ai:0 # Major versionMigration Notes
result.tablesreplacesresult.media.get('tables')- Undetected browser requires
browser_type="undetected" - Multi-config uses
url_matcherparameter inCrawlerRunConfig
🎉 What's Next?
This release sets the foundation for even more advanced features coming in v0.8:
- AI-powered content understanding
- Advanced crawling strategies
- Enhanced data pipeline integrations
- More stealth and anti-detection capabilities
📝 Complete Documentation
- Full Release Notes - Detailed technical explanations
- Changelog - Complete list of changes
- Documentation - Full API reference and guides
- Discord Community - Get help and share experiences
Live Long and import crawl4ai
Crawl4AI continues to evolve with your needs. This release makes it stealthier, smarter, and more scalable. Try the new undetected browser and multi-config features—they're game changers!
- The Crawl4AI Team
📝 This release draft was composed and edited by human but rewritten and finalized by AI. If you notice any mistakes, please raise an issue.
v0.7.2: CI/CD & Dependency Optimization Update
🚀 Crawl4AI v0.7.2: CI/CD & Dependency Optimization Update
July 25, 2025 • 3 min read
This release introduces automated CI/CD pipelines for seamless releases and optimizes dependencies for a lighter, more efficient package.
🎯 What's New
🔄 Automated Release Pipeline
- GitHub Actions CI/CD: Automated PyPI and Docker Hub releases on tag push
- Multi-platform Docker images: Support for both AMD64 and ARM64 architectures
- Version consistency checks: Ensures tag, package, and Docker versions align
- Automated release notes: GitHub releases created automatically
📦 Dependency Optimization
- Moved sentence-transformers to optional dependencies: Significantly reduces default installation size
- Lighter Docker images: Optimized Dockerfile for faster builds and smaller images
- Better dependency management: Core vs. optional dependencies clearly separated
🏗️ CI/CD Pipeline
The new automated release process ensures consistent, reliable releases:
# Trigger releases with a simple tag
git tag v0.7.2
git push origin v0.7.2
# Automatically:
# ✅ Validates version consistency
# ✅ Builds and publishes to PyPI
# ✅ Builds multi-platform Docker images
# ✅ Pushes to Docker Hub with proper tags
# ✅ Creates GitHub release💾 Lighter Installation
Default installation is now significantly smaller:
# Core installation (smaller, faster)
pip install crawl4ai==0.7.2
# With ML features (includes sentence-transformers)
pip install crawl4ai[transformer]==0.7.2
# Full installation
pip install crawl4ai[all]==0.7.2🐳 Docker Improvements
Enhanced Docker support with multi-platform images:
# Pull the latest version
docker pull unclecode/crawl4ai:0.7.2
docker pull unclecode/crawl4ai:latest
# Available tags:
# - unclecode/crawl4ai:0.7.2 (specific version)
# - unclecode/crawl4ai:0.7 (minor version)
# - unclecode/crawl4ai:0 (major version)
# - unclecode/crawl4ai:latest🔧 Technical Details
Dependency Changes
sentence-transformersmoved from required to optional dependencies- Reduces default installation by ~500MB
- No impact on functionality when transformer features aren't needed
CI/CD Configuration
- GitHub Actions workflows for automated releases
- Version validation before publishing
- Parallel PyPI and Docker Hub deployments
- Automatic tagging strategy for Docker images
🚀 Installation
pip install crawl4ai==0.7.2No breaking changes - direct upgrade from v0.7.0 or v0.7.1.
Questions? Issues?
- GitHub: github.com/unclecode/crawl4ai
- Discord: discord.gg/crawl4ai
- Twitter: @unclecode
P.S. The new CI/CD pipeline will make future releases faster and more reliable. Thanks for your patience as we improve our release process!
v0.7.1:Update
🛠️ Crawl4AI v0.7.1: Minor Cleanup Update
July 17, 2025 • 2 min read
A small maintenance release that removes unused code and improves documentation.
🎯 What's Changed
- Removed unused StealthConfig from
crawl4ai/browser_manager.py - Updated documentation with better examples and parameter explanations
- Fixed virtual scroll configuration examples in docs
🧹 Code Cleanup
Removed unused StealthConfig import and configuration that wasn't being used anywhere in the codebase. The project uses its own custom stealth implementation through JavaScript injection instead.
# Removed unused code:
from playwright_stealth import StealthConfig
stealth_config = StealthConfig(...) # This was never used📖 Documentation Updates
- Fixed adaptive crawling parameter examples
- Updated session management documentation
- Corrected virtual scroll configuration examples
🚀 Installation
pip install crawl4ai==0.7.1No breaking changes - upgrade directly from v0.7.0.
Questions? Issues?
- GitHub: github.com/unclecode/crawl4ai
- Discord: discord.gg/crawl4ai