A macOS application for recording screen activities and user interactions to create labeled datasets for AI training.
- Record screen activities including mouse movements, clicks, and keyboard input
- Capture screenshots at regular intervals and on user interaction
- Analyze UI element relationships (hierarchical, spatial, functional, logical)
- Visualize UI relationships as graphical networks
- Export recorded sessions in multiple formats (JSON, COCO, YOLO) for machine learning
- Generate structured data for training UI automation and understanding models
- macOS 11.0 or later
- Xcode 13.0 or later (for development)
- Clone this repository
- Run the build script to create the application:
chmod +x build_app.sh ./build_app.sh
- Configure permissions using the helper script:
chmod +x request_all_permissions.sh ./request_all_permissions.sh
- The application will be built at
./WeLabelDataRecorder.app
WeLabelDataRecorder requires several system permissions to function correctly. When you first launch the application, it will request:
- Screen Recording Permission: Required to capture screenshots and video
- Accessibility Permission: Required to detect UI elements via Accessibility API
The permissions are configured through two key files:
- complete_Info.plist: Contains usage descriptions required by macOS
- WeLabelDataRecorder.entitlements: Specifies which entitlements the app needs
The build script automatically incorporates these permissions into the app bundle.
If you experience permission issues:
-
Use the included diagnostic script:
chmod +x check_permissions_status.swift ./check_permissions_status.swift
-
For permission issues with screen recording or accessibility:
# Reset screen recording permissions tccutil reset ScreenCapture # Reset accessibility permissions tccutil reset Accessibility
-
Manually verify permissions in System Preferences → Security & Privacy → Privacy
For full details on permission configuration, see DOCUMENTATION.md
.
- Launch the application
- Grant necessary permissions for screen recording and accessibility when prompted
- Click "Start Recording" to begin a recording session
- Perform the activities you want to record
- Click "Stop Recording" to end the session
- Click "Export Last Session" to save the recorded data
-
Application Launch:
- Normal mode: Shows UI with recording controls
- Test mode: Generates a sample session and exports it (
--test-export <path>
)
-
Recording Process:
- Captures periodic screenshots
- Records mouse and keyboard events
- Tracks UI element interactions via accessibility APIs
-
UI Element Relationship Analysis:
- For each UI element interaction, analyzes relationships with other elements:
- Hierarchical (parent, child, sibling)
- Spatial (above, below, left, right)
- Functional (button controls a form, label describes a field)
- Logical (sequence, similar naming patterns)
- For each UI element interaction, analyzes relationships with other elements:
-
Data Export:
- Exports in multiple formats:
- JSON: Raw data with full details
- COCO: For computer vision applications
- YOLO: For object detection models
- Includes session metadata, interactions, UI element information, relationships, and screenshots
- Exports in multiple formats:
-
Visualization:
- Generates DOT graphs of UI element relationships
- Can be converted to PNG images for visual analysis
For quick testing and demonstration, use the test export mode:
chmod +x run_test_export.sh
./run_test_export.sh
This will:
- Create a test session with sample data
- Export it in all supported formats
- Open the export directory for inspection
- Added clear session list with thumbnails and session metrics
- Implemented status indicators for permissions and disk space
- Improved recording and export buttons with icons
- Added visual feedback during recording
- Created structured table view for session management
- Streamlined permission handling and request process
- Added permission status indicators in the UI
- Created permission request helper script
- Ensured consistent application identity across builds
- Reduced unnecessary permission requests
- Implemented comprehensive relationship analysis between UI elements
- Added support for hierarchical, spatial, functional, and logical relationships
- Integrated relevance scoring for relationship prioritization
- Created visualization capabilities using DOT graph format
- Added export of relationship data to all formats
- Completed implementation for JSON, COCO, and YOLO export formats
- Added structured metadata for all exports
- Implemented proper organization of files and directories
- Added support for element relationship export in all formats
- Fixed permissions issues for screen recording and accessibility
- Updated Info.plist with proper usage descriptions
- Optimized entitlements configuration
- Added permissions check utility script
- Improved application bundling process
- Added proper serialization support for AppKit/CoreGraphics types (NSPoint, CGRect)
- Implemented robust session storage mechanisms with in-memory backup
- Enhanced logging for improved debugging
- Added direct reference to the last session in the view controller
The application is organized into several key components:
- Recording Manager: Handles screen recording, event monitoring, and accessibility features
- Session Manager: Manages recording sessions, stores interaction data
- Export Manager: Handles exporting sessions to various formats
- UI Element Relationship Analyzer: Analyzes connections between UI elements
- User Interaction Models: Defines data structures for different user interactions
Our development roadmap includes the following milestones:
- ✅ Basic macOS application setup - Complete
- ✅ Screen recording and user action capturing - Complete
- ✅ Data storage system - Complete
- ✅ Application bundling and permissions handling - Complete
- ✅ UI Element Annotation - Complete
- ✅ Basic accessibility data collection
- ✅ UI element detection during mouse clicks
- ✅ UI element detection during keyboard input
- ✅ UI element hierarchy and relationship tracking
- ✅ Advanced element properties collection
- ✅ Export capabilities for ML training - Complete
- ✅ Basic JSON export
- ✅ COCO format export for computer vision
- ✅ YOLO format export for object detection
- 🔄 Performance optimization and UX improvements - In Progress
- ✅ Improved session management interface
- ✅ Added status indicators and visual feedback
- ✅ Enhanced permission management system
- 🚀 [NEXT PRIORITY] Make recording more efficient
- ⏳ Add filtering capabilities for recordings
- ⏳ Web interface integration - Planned
- Create simple web viewer for recorded sessions
- Add annotation capabilities in web interface
- ⏳ Comprehensive testing and stabilization - Planned
- Unit and integration tests
- Performance optimization
- Documentation
-
Enhanced Relationship Analysis
- Implement more advanced context-aware relationship detection
- Add semantic understanding of UI patterns
- Improve relevance scoring algorithms
-
Performance & Stability Improvements
- Optimize relationship analysis for large UI hierarchies
- Reduce memory usage during long recording sessions
- Add support for multi-monitor setups
-
Integration Capabilities
- Add API for integrating with other tools
- Support for collaborative dataset creation
- Integration with popular ML frameworks
swift build -c release
swift test
To quickly test the export functionality:
chmod +x run_test_export.sh
./run_test_export.sh
This project is licensed under the MIT License - see the LICENSE file for details.