Skip to content

lovisdotio/scroll-text-extractor

Repository files navigation

📜 Text Extractor - Scroll to Capture

A minimalist Chrome extension that captures text automatically as you scroll. No clicking, no complexity - just scroll and it works.

Demo ↑ Scroll any webpage, text is captured automatically in visual order

✨ Features

  • 📜 Just Scroll - Text is captured automatically as you scroll
  • 🎯 Smart Filtering - Only captures real content, removes navigation/UI/duplicates
  • 📋 Visual Order - Text appears in the exact order you see it (top to bottom)
  • 💾 One-Click Export - Download as .txt or copy to clipboard
  • 🌐 Universal - Works on ALL websites (static, dynamic, Notion, Slack, etc.)
  • ⚡ Lightweight - 200 lines of code, non-intrusive, fast
  • 🎨 Minimalist UI - Black theme, large textarea, clean interface
  • 🔐 Privacy - Everything stays on your device, zero tracking

⏱️ Important Tips

  • Wait 1-2 seconds after scrolling for text to appear
  • Reload page (F5) if nothing captures at first
  • Scroll slowly through the content you want to extract
  • Works in all languages (English, French, etc.)

💻 Compatibility

  • Browser: Google Chrome (tested on Chrome 118+)
  • OS: Tested on macOS (should work on Windows/Linux but not tested)
  • Manifest: V3 (latest Chrome extension standard)

🚀 Installation (Chrome only)

This extension is designed for Google Chrome and has been tested on macOS. It may work on Windows/Linux but hasn't been tested yet.

Step 1: Prepare the Extension

  1. Download or clone this repository
  2. Make sure all files are in the same folder:
    • manifest.json
    • content.js
    • popup.html
    • popup.js
    • popup.css
    • icon16.png, icon48.png, icon128.png

Step 2: Install in Chrome

  1. Open Google Chrome
  2. Navigate to chrome://extensions/
  3. Enable Developer mode (toggle in the top-right corner)
  4. Click "Load unpacked" button
  5. Select the folder containing all the extension files
  6. The extension is now installed! You'll see the "T" icon in your Chrome toolbar

⚠️ After Installation

  • If text doesn't appear, reload the page (F5) where you want to extract text
  • The extension activates on scroll, not on page load

📖 How to Use

Three Simple Steps

  1. Open any webpage - Works on all sites
  2. Scroll through the page - Text is captured automatically
  3. Click the extension icon → Download or Copy

That's it! No buttons to click, no settings to configure.

⏱️ Important Notes

  • Wait 1-2 seconds after scrolling for capture to complete
  • If nothing appears: Reload the page (F5) and scroll again
  • Scroll slowly to ensure all content loads (especially on Slack/Notion)
  • Text appears in exact visual order from top to bottom

Interface

  • Large Textarea: See all captured text at once
  • Click textarea: Auto-selects all text
  • Toggle ON/OFF: Green = Active, Gray = Inactive
  • 📥 Download: Save as timestamped .txt file
  • 📋 Copy: Copy all to clipboard
  • 🗑️ Clear: Remove all captured text

💡 Works On All Sites

  • Static websites - Blogs, articles, documentation
  • Dynamic apps - React, Vue, Angular SPAs
  • Notion - Pages, databases, documents
  • Slack - Messages, channels, DMs
  • GitHub - Code, README, issues
  • Stack Overflow - Questions, answers
  • News sites - Articles, comments
  • Social media - Posts, threads
  • Any website with text content

⚙️ Technical Details

How It Works

  1. Content Script (content.js) monitors scroll events on web pages
  2. Debouncing ensures text is only captured when scrolling pauses (300ms)
  3. Smart Filtering extracts text from relevant HTML elements (p, h1-h6, li, div, etc.)
  4. Duplicate Prevention uses Set data structure to avoid capturing the same text twice
  5. Local Storage saves captured text using Chrome's storage API
  6. Popup Interface provides real-time view and control over captured content

Performance Optimization

  • Passive scroll listeners for better performance
  • Debounced extraction to minimize processing
  • Efficient DOM queries targeting specific elements
  • Smart viewport detection to only process visible content

Privacy & Security

  • ✅ All data stored locally in your browser
  • ✅ No external API calls or data transmission
  • ✅ No tracking or analytics
  • ✅ Open source - inspect the code yourself!

🛠️ Troubleshooting

Text Not Being Captured?

  1. Check if extension is Active (green status in popup)
  2. Try scrolling more slowly - give the extension time to process
  3. Some websites use shadow DOM or iframes which may prevent extraction
  4. Refresh the page and try again

Extension Not Working?

  1. Make sure all files are present in the extension folder
  2. Check Chrome's extension page (chrome://extensions/) for any errors
  3. Try disabling and re-enabling the extension
  4. Reload the webpage you're trying to extract from

"Permission Denied" Errors?

  • Some internal Chrome pages (like chrome:// URLs) don't allow extensions to run
  • Try the extension on regular websites (https:// URLs)

🔄 Updates & Modifications

Feel free to modify the extension to suit your needs:

  • Adjust scroll delay: Change the setTimeout value in content.js (line 100)
  • Change text filters: Modify the querySelectorAll in line 42 to target different elements
  • Customize styling: Edit popup.css to change colors, fonts, or layout
  • Add features: The code is well-commented and easy to extend

📝 File Structure

appscrolldownloadtext/
├── manifest.json       # Extension configuration
├── content.js          # Background script that captures text
├── popup.html          # Extension popup interface
├── popup.js            # Popup logic and interactions
├── popup.css           # Styling for the popup
├── icon16.png          # Extension icon (16x16)
├── icon48.png          # Extension icon (48x48)
├── icon128.png         # Extension icon (128x128)
└── README.md           # This file

🤝 Contributing

This is a custom extension created for personal use. Feel free to:

  • Modify it for your own needs
  • Share it with others
  • Improve the code
  • Add new features

⚖️ License

This extension is provided as-is for personal use. Feel free to use, modify, and distribute.

🐛 Known Limitations

  • Browser: Only tested on Google Chrome (macOS)
  • Not tested on: Windows, Linux, Firefox, Safari, Edge
  • Cannot extract text from:
    • Canvas elements (images rendered via JavaScript)
    • Some iframes with cross-origin restrictions
    • Chrome internal pages (chrome://)
    • PDF viewers
  • Text extraction quality depends on website HTML structure
  • Very dynamic websites may require slow scrolling

📞 Support

If you encounter issues:

  1. Check the troubleshooting section above
  2. Inspect the browser console for error messages
  3. Verify all files are present and correctly named
  4. Try reinstalling the extension

Made with ❤️ for easier text extraction while browsing

Enjoy automatic text capturing! 🎉

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published