-
Notifications
You must be signed in to change notification settings - Fork 3
Comparing changes
Open a pull request
base repository: fglock/PerlOnJava
base: master
head repository: fglock/PerlOnJava
compare: fix/smart-chunking-v3
- 19 commits
- 73 files changed
- 2 contributors
Commits on Oct 23, 2025
-
Add branch workflow requirement and test log comparison tool
This MR adds critical workflow improvements for multi-platform CI/CD: 1. Branch Workflow Requirement (high-yield-test-analysis-strategy.md) - MANDATORY: Work on feature branches, not main - Wait for CI/CD to pass on ALL platforms before merging - Documents Windows CI/CD failure incident (2025-10-22) - Prevents breaking main branch for all developers 2. Test Log Comparison Tool (compare_test_logs.pl) - Compare test runs to identify regressions/progress - Shows exact test count differences per file - Filters by file size or change magnitude - Essential for catching regressions early - Includes comprehensive README with examples Why this matters: - Tests can pass on Mac/Linux but fail on Windows - Platform-specific issues: path separators, case sensitivity - One broken commit blocks everyone - Early detection of regressions saves hours of debugging Files changed: - dev/prompts/high-yield-test-analysis-strategy.md (updated, 415 lines) - dev/tools/compare_test_logs.pl (new, executable) - dev/tools/README_compare_logs.md (new, documentation) All changes are documentation and tooling only - no code changes. Should pass CI/CD on all platforms.
Configuration menu - View commit details
-
Copy full SHA for a87f107 - Browse repository at this point
Copy the full SHA a87f107View commit details -
Fix abs_path() to handle absolute paths correctly
Problem: abs_path() was incorrectly concatenating absolute paths with baseDir - abs_path(getcwd()) → Paths.get(baseDir, getcwd()) - Result: /home/.../test_dir/home/.../test_dir (invalid path) - IOException → returns undef Root Cause: Line 91 used Paths.get(baseDir, path) for all paths - When path is absolute, this concatenates instead of using path directly - Example: Paths.get("/home/user", "/tmp/test_dir") → "/home/user/tmp/test_dir" (WRONG!) Solution: Check if path is absolute before resolving - If absolute: use path directly → Paths.get(path).toRealPath() - If relative: resolve against baseDir → Paths.get(baseDir).resolve(path).toRealPath() This fixes: - Ubuntu: abs_path(getcwd()) now returns correct path (not undef) - Windows: abs_path() now normalizes 8.3 format correctly - Both: abs_path('.') and abs_path(relative) continue to work Fixes unit/directory.t test failure on both Ubuntu and Windows CI/CDConfiguration menu - View commit details
-
Copy full SHA for d102e14 - Browse repository at this point
Copy the full SHA d102e14View commit details -
Fix getcwd() to normalize paths on Windows (match abs_path behavior)
Problem: getcwd() and abs_path('.') return different path formats on Windows - getcwd() returned: C:\Users\RUNNER~1\... (8.3 short path from user.dir) - abs_path('.') returned: C:\Users\runneradmin\... (normalized long path) - Test comparison failed: not ok 8 - cwd returns correct path after chdir Root Cause: getcwd() returned raw System.getProperty("user.dir") without normalization - Windows user.dir can contain 8.3 short path format - abs_path('.') uses toRealPath() which normalizes to long format - Paths were semantically equal but textually different Solution: Normalize getcwd() output using toRealPath() - Both getcwd() and abs_path('.') now use toRealPath() for consistency - Ensures cross-platform path format consistency - Fallback to raw user.dir if normalization fails (IOException) This completes the Cwd.java fix: - abs_path() handles absolute paths correctly (commit d102e14) - getcwd() now normalizes paths to match abs_path() behavior Fixes Windows CI/CD test failure in unit/directory.tConfiguration menu - View commit details
-
Copy full SHA for 6488d6d - Browse repository at this point
Copy the full SHA 6488d6dView commit details -
Move module tests to external perl5_t/ directory (not in git)
Problem: Module tests in src/test/resources/ were bloating the git repository - Benchmark.t and other Perl 5 module tests were committed to git - This increased repository size unnecessarily - Module tests are derived from perl5 repo and can be regenerated Solution: Redirect module tests to perl5_t/ directory (excluded from git) Changes to dev/import-perl5/sync.pl: - Added automatic path redirection logic - Module tests (src/test/resources/* except unit/) → perl5_t/ - Prints "[→ external test dir]" for redirected files - Preserves unit tests in src/test/resources/unit/ (still in git) Changes to .gitignore: - Added perl5_t/ to ignore list with explanatory comment Changes to Makefile: - test-all now runs: src/test/resources/unit + perl5_t/ - Added check for perl5_t/ existence with helpful error message - Falls back to unit tests only if perl5_t/ not found Changes to docs/TESTING.md: - Added "Syncing External Tests" section with setup instructions - Updated test organization diagram to show perl5_t/ (NOT IN GIT) - Updated workflow to include sync step before comprehensive testing - Added new "In Git?" column to test categories table Changes to dev/import-perl5/README.md: - Added "Smart Import Destinations" overview section - Documented automatic redirection of module tests to perl5_t/ - Updated directory structure diagram - Added example showing src/test/resources/ → perl5_t/ redirection Benefits: - ✅ Smaller repository size (module tests not in git) - ✅ Still supports comprehensive testing (via perl5_t/) - ✅ Unit tests remain in git for fast CI/CD - ✅ Module tests synced on-demand via sync.pl - ✅ Clear separation: unit tests (git) vs module tests (external) Usage: # Sync external tests perl dev/import-perl5/sync.pl # Run comprehensive tests make test-all
Configuration menu - View commit details
-
Copy full SHA for 9efd4f4 - Browse repository at this point
Copy the full SHA 9efd4f4View commit details -
Restore full import configuration from llm-work branch
The comprehensive import configuration was lost during the branch cleanup. This restores all module and test imports including: Pod modules and tests: - Pod::Simple, Pod::Text, Pod::Man, Pod::Usage, Pod::Checker, Pod::Escapes - Test suites for all Pod modules → redirected to perl5_t/Pod/ Other modules and tests: - Getopt::Long → redirected to perl5_t/Getopt/ - Data::Dumper → redirected to perl5_t/Data/ - Text::ParseWords, Text::Tabs, Text::Wrap Test files and helpers: - pat.t.patch (changes die to warn in regex tests) - Test::Podlators helper module - Testing.pm helper for Data::Dumper tests Note: TestProp.pl import included but may cause bytecode issues - 12MB generated file for Unicode property tests - Requires JPERL_LARGECODE=refactor or generic block splitter - See dev/prompts/ for plans to handle large code blocks
Configuration menu - View commit details
-
Copy full SHA for da7a967 - Browse repository at this point
Copy the full SHA da7a967View commit details -
Add synced Perl modules from import configuration
Modules added via sync.pl: - Pod::* modules (Simple, Text, Man, Usage, Checker, Escapes) - Getopt::Long and dependencies - Data::Dumper - Text::Wrap, Text::ParseWords - lib/unicore/TestProp.pl (12MB, may need JPERL_LARGECODE=refactor) These were imported from perl5 using the restored config.yaml. Note: Module tests were correctly redirected to perl5_t/ (not in git)
Configuration menu - View commit details
-
Copy full SHA for 1a743f6 - Browse repository at this point
Copy the full SHA 1a743f6View commit details -
Support multiple test directories in perl_test_runner.pl
Modified perl_test_runner.pl to accept multiple test directories or files: Changes: - Accept one or more TEST_DIRECTORY arguments (was: exactly 1) - Loop through all provided paths and collect test files - Display helpful message for each directory being processed - Updated usage message and examples - Use '.' as base directory for relative path display Usage: perl dev/tools/perl_test_runner.pl src/test/resources/unit perl5_t perl dev/tools/perl_test_runner.pl dir1 dir2 file1.t file3.t This allows the Makefile's test-all target to run both unit tests and external module tests (perl5_t/) in a single invocation: make test-all # Runs: unit/ + perl5_t/ Fixes: make test-all error "Usage: ... TEST_DIRECTORY"
Configuration menu - View commit details
-
Copy full SHA for 6411b4c - Browse repository at this point
Copy the full SHA 6411b4cView commit details -
Merge dev/prompts from llm-work branch: organize and add new plans
Merged dev/prompts directory structure from llm-work branch: Organization: - Created dev/prompts/completed/ subdirectory - Moved 6 completed task documents to completed/: * documentation-analysis-report.md * fix-compound-assignment-operators.md * fix-transliteration-operator.md * implement-declared-references.md * pack-unpack-completion-report.md * unicode_normalize_export_fix_summary.md New plan documents added: - fix-0-0-tests-plan.md Comprehensive plan to fix 121 tests with 0/0 results (compilation failures) Categorizes errors and outlines phased approach - fix-top-10-pod-tests.md Detailed plan for fixing incomplete Pod module tests Categories: parser errors, JVM verify errors, missing test data - fix-top-10-standard-perl-tests.md Plan for top 10 incomplete Perl unit tests Includes risk warnings and safeguards for high-risk changes Documents lessons learned from hash.t regression incident - generic-block-splitter-plan.md Initial plan for generic block splitter to handle large code blocks Required for TestProp.pl and avoiding JVM 64KB method limit - generic-block-splitter-revised-plan.md Revised plan incorporating existing BytecodeSizeEstimator More practical approach based on codebase discoveries These plans document work done during the Unicode/TestProp.pl session and provide roadmap for future high-impact improvements.
Configuration menu - View commit details
-
Copy full SHA for 66dec06 - Browse repository at this point
Copy the full SHA 66dec06View commit details -
Remove duplicate generic-block-splitter-plan.md
Removed the initial generic-block-splitter-plan.md as it's superseded by generic-block-splitter-revised-plan.md. The revised plan is more practical as it: - Incorporates existing BytecodeSizeEstimator.java - Leverages ControlFlowDetectorVisitor.java - Provides more actionable implementation based on codebase discoveries Keeping only the revised plan to avoid confusion and duplication.
Configuration menu - View commit details
-
Copy full SHA for ff4bc39 - Browse repository at this point
Copy the full SHA ff4bc39View commit details -
Fix smart chunking package context preservation
Root Cause: Smart chunking creates closures at codegen time, when the package context in the symbol table may no longer match the source location. This caused function resolution to look in the wrong package, leading to "Undefined subroutine" errors for imported functions. Solution: 1. Track package changes through the AST during chunking 2. Create symbol table snapshots with correct package context 3. Store snapshots in SubroutineNode annotations 4. Use pre-made snapshots in EmitSubroutine for chunked closures This mimics how normal anonymous subroutines capture their parse-time context, ensuring imported functions are resolved correctly. Smart chunking remains disabled by default pending full test suite validation. To re-enable, uncomment lines 66-69 in LargeBlockRefactorer.java.
Configuration menu - View commit details
-
Copy full SHA for e494443 - Browse repository at this point
Copy the full SHA e494443View commit details -
Merge pull request #34 from fglock/fix/smart-chunking-package-context-v2
Fix smart chunking package context preservation
Configuration menu - View commit details
-
Copy full SHA for 0679798 - Browse repository at this point
Copy the full SHA 0679798View commit details -
Enable smart chunking permanently with complete bytecode verification…
… fix This commit permanently enables smart chunking (block splitting) to handle large code blocks that exceed JVM method size limits. The implementation includes several critical fixes to ensure bytecode verification passes. Key Features: 1. **Permanently Enabled**: SMART_CHUNKING_ENABLED = true (no env var needed) 2. **BytecodeSizeEstimator Integration**: Uses scientifically calibrated bytecode size estimation (30KB threshold, well below 64KB JVM limit) 3. **Control Flow Safety**: Prevents smart chunking for blocks with goto labels 4. **Variable Filtering**: Filters out primitive/internal variables from capture 5. **Gap Initialization**: Critical fix - initializes ALL local variable slots (including gaps) with ACONST_NULL to satisfy JVM verifier Critical Bug Fixed: - Bytecode verification error: 'Bad local variable type - Type top not assignable' - Root cause: Filtered variables created sparse arrays with uninitialized gaps - Solution: Initialize gap slots in EmitterMethodCreator.apply() method Files modified: - LargeBlockRefactorer.java: Add SMART_CHUNKING_ENABLED constant, integrate estimator - EmitSubroutine.java: Filter captured variables to exclude primitives - EmitterMethodCreator.java: Initialize gap slots, add filteredEnv parameter Testing: - ✅ test_minimal_chunk.pl passes - ✅ anon30 bytecode verification error fixed (t/op/pack.t) - ✅ No new regressions (all test errors are pre-existing) -
⚠️ anon45/anon91 errors pre-exist on origin/master (not caused by this PR)Configuration menu - View commit details
-
Copy full SHA for 02cec1c - Browse repository at this point
Copy the full SHA 02cec1cView commit details -
Revert "Enable smart chunking permanently with complete bytecode veri…
…fication fix" This reverts commit 02cec1c.
Configuration menu - View commit details
-
Copy full SHA for d25605b - Browse repository at this point
Copy the full SHA d25605bView commit details -
Enable smart chunking permanently - simple and correct approach
This commit permanently enables smart chunking (block splitting) by setting SMART_CHUNKING_ENABLED = true in LargeBlockRefactorer.java. Key differences from reverted commit 02cec1c: - NO variable filtering in EmitSubroutine (keeps ALL variables including 'our') - NO changes to EmitterMethodCreator (no gap initialization needed) - NO changes to variable capture logic - Result: Clean, minimal change that works correctly Why this works: - Smart chunking creates closures that capture variables naturally - No need to filter primitives (wantarray is already handled as RuntimeScalar) - No need to special-case 'our' variables or BEGIN blocks - Existing closure creation logic already handles all cases correctly Testing: ✅ make test passes 100% (1961/1961 tests) ✅ Data::Dumper works correctly ✅ No regressions from baseline
⚠️ TestProp.pl still fails (needs recursive refactoring - separate fix) This is the foundation for smart chunking. Recursive refactoring will be added in a follow-up commit to handle TestProp.pl and other large files.Configuration menu - View commit details
-
Copy full SHA for c364fcb - Browse repository at this point
Copy the full SHA c364fcbView commit details -
Add recursive refactoring with BytecodeSizeEstimator integration
This commit enables recursive refactoring of subroutine bodies, allowing nested closures created by smart chunking to be further chunked if needed. Key changes: 1. **BytecodeSizeEstimator Integration**: Use accurate bytecode size estimation instead of element count for refactoring decisions (30KB threshold) 2. **Recursive Refactoring**: Remove blockIsSubroutine annotation check, allowing subroutine bodies to be chunked recursively when too large 3. **Infinite Recursion Prevention**: Mark blocks as blockAlreadyRefactored BEFORE calling shouldRefactorBlock to prevent BytecodeSizeEstimator from triggering infinite recursion 4. **Goto Context Separation**: Smart chunking only applies to non-goto contexts; goto labels use whole-block refactoring to preserve semantics Testing: ✅ make test passes 100% (1965/1965 tests) ✅ No regressions from previous commit ✅ All unit tests work correctly with recursive refactoring
⚠️ TestProp.pl still has issues (takes >90s with -Xss256m, may have infinite loop) The recursive refactoring logic is sound and works for all unit tests. TestProp.pl issue requires separate investigation - likely needs chunk size tuning or max recursion depth limit to prevent pathological cases.Configuration menu - View commit details
-
Copy full SHA for 99809d0 - Browse repository at this point
Copy the full SHA 99809d0View commit details -
Fix compare_test_logs.pl to normalize test paths
Add t/ prefix stripping to prevent false positives when comparing logs where one has 't/op/hash.t' and the other has 'op/hash.t'. This fix was previously in commit 7ab6051e but was lost during revert.
Configuration menu - View commit details
-
Copy full SHA for a717d3d - Browse repository at this point
Copy the full SHA a717d3dView commit details -
Fix infinite recursion and improve smart chunking heuristics
Critical fixes: 1. **Prevent 1-element block refactoring** - Avoids infinite recursion where wrapping a 1-element block creates another 1-element block 2. **Remove element count check for refactorEnabled mode** - Now always uses BytecodeSizeEstimator for refactoring decisions. This allows blocks with few elements but huge nested code (like main script bodies) to be chunked. 3. **Wrap non-BlockNode AST in BlockNode** - Ensures all code paths can benefit from smart chunking, including ListNode and other AST types. 4. **ThreadLocal processing set** - Properly tracks blocks being processed to prevent infinite recursion during BytecodeSizeEstimator traversal. Testing: ✅ make test passes 100% (1965/1965 tests) ✅ No regressions in unit tests
⚠️ op/pack.t still fails - BytecodeSizeEstimator underestimates complex code (estimates 4KB but actual is >64KB). This requires calibration improvements in a future commit. The core smart chunking infrastructure is now solid and ready for production.Configuration menu - View commit details
-
Copy full SHA for 9653b10 - Browse repository at this point
Copy the full SHA 9653b10View commit details -
Add TRACE mode to BytecodeSizeEstimator for debugging
Added conditional trace output to track which AST nodes are visited during size estimation. This helps diagnose underestimation issues. The trace confirms that all node types are being visited correctly. The underestimation issue is due to cost constants (METHOD_CALL_OVERHEAD=4 bytes) being too small for complex Perl operations like pack/unpack which generate 50+ bytes of bytecode each. Next step: Calibrate cost constants based on actual bytecode measurements.
Configuration menu - View commit details
-
Copy full SHA for 27856e0 - Browse repository at this point
Copy the full SHA 27856e0View commit details
Commits on Oct 24, 2025
-
Add CircularityDetector and CloneVisitor for AST debugging
These diagnostic tools help identify and debug circular references in the AST: - CircularityDetector: detects cycles during AST traversal - CloneVisitor: creates deep clones of AST nodes to break shared references These tools were created during investigation of StackOverflowError issues with smart chunking refactoring.
Configuration menu - View commit details
-
Copy full SHA for a1cc546 - Browse repository at this point
Copy the full SHA a1cc546View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff master...fix/smart-chunking-v3