Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: fglock/PerlOnJava
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: master
Choose a base ref
...
head repository: fglock/PerlOnJava
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: fix/smart-chunking-v3
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 19 commits
  • 73 files changed
  • 2 contributors

Commits on Oct 23, 2025

  1. Add branch workflow requirement and test log comparison tool

    This MR adds critical workflow improvements for multi-platform CI/CD:
    
    1. Branch Workflow Requirement (high-yield-test-analysis-strategy.md)
       - MANDATORY: Work on feature branches, not main
       - Wait for CI/CD to pass on ALL platforms before merging
       - Documents Windows CI/CD failure incident (2025-10-22)
       - Prevents breaking main branch for all developers
    
    2. Test Log Comparison Tool (compare_test_logs.pl)
       - Compare test runs to identify regressions/progress
       - Shows exact test count differences per file
       - Filters by file size or change magnitude
       - Essential for catching regressions early
       - Includes comprehensive README with examples
    
    Why this matters:
    - Tests can pass on Mac/Linux but fail on Windows
    - Platform-specific issues: path separators, case sensitivity
    - One broken commit blocks everyone
    - Early detection of regressions saves hours of debugging
    
    Files changed:
    - dev/prompts/high-yield-test-analysis-strategy.md (updated, 415 lines)
    - dev/tools/compare_test_logs.pl (new, executable)
    - dev/tools/README_compare_logs.md (new, documentation)
    
    All changes are documentation and tooling only - no code changes.
    Should pass CI/CD on all platforms.
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    a87f107 View commit details
    Browse the repository at this point in the history
  2. Fix abs_path() to handle absolute paths correctly

    Problem: abs_path() was incorrectly concatenating absolute paths with baseDir
    - abs_path(getcwd()) → Paths.get(baseDir, getcwd())
    - Result: /home/.../test_dir/home/.../test_dir (invalid path)
    - IOException → returns undef
    
    Root Cause: Line 91 used Paths.get(baseDir, path) for all paths
    - When path is absolute, this concatenates instead of using path directly
    - Example: Paths.get("/home/user", "/tmp/test_dir")
      → "/home/user/tmp/test_dir" (WRONG!)
    
    Solution: Check if path is absolute before resolving
    - If absolute: use path directly → Paths.get(path).toRealPath()
    - If relative: resolve against baseDir → Paths.get(baseDir).resolve(path).toRealPath()
    
    This fixes:
    - Ubuntu: abs_path(getcwd()) now returns correct path (not undef)
    - Windows: abs_path() now normalizes 8.3 format correctly
    - Both: abs_path('.') and abs_path(relative) continue to work
    
    Fixes unit/directory.t test failure on both Ubuntu and Windows CI/CD
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    d102e14 View commit details
    Browse the repository at this point in the history
  3. Fix getcwd() to normalize paths on Windows (match abs_path behavior)

    Problem: getcwd() and abs_path('.') return different path formats on Windows
    - getcwd() returned: C:\Users\RUNNER~1\... (8.3 short path from user.dir)
    - abs_path('.') returned: C:\Users\runneradmin\... (normalized long path)
    - Test comparison failed: not ok 8 - cwd returns correct path after chdir
    
    Root Cause: getcwd() returned raw System.getProperty("user.dir") without normalization
    - Windows user.dir can contain 8.3 short path format
    - abs_path('.') uses toRealPath() which normalizes to long format
    - Paths were semantically equal but textually different
    
    Solution: Normalize getcwd() output using toRealPath()
    - Both getcwd() and abs_path('.') now use toRealPath() for consistency
    - Ensures cross-platform path format consistency
    - Fallback to raw user.dir if normalization fails (IOException)
    
    This completes the Cwd.java fix:
    - abs_path() handles absolute paths correctly (commit d102e14)
    - getcwd() now normalizes paths to match abs_path() behavior
    
    Fixes Windows CI/CD test failure in unit/directory.t
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    6488d6d View commit details
    Browse the repository at this point in the history
  4. Move module tests to external perl5_t/ directory (not in git)

    Problem: Module tests in src/test/resources/ were bloating the git repository
    - Benchmark.t and other Perl 5 module tests were committed to git
    - This increased repository size unnecessarily
    - Module tests are derived from perl5 repo and can be regenerated
    
    Solution: Redirect module tests to perl5_t/ directory (excluded from git)
    
    Changes to dev/import-perl5/sync.pl:
    - Added automatic path redirection logic
    - Module tests (src/test/resources/* except unit/) → perl5_t/
    - Prints "[→ external test dir]" for redirected files
    - Preserves unit tests in src/test/resources/unit/ (still in git)
    
    Changes to .gitignore:
    - Added perl5_t/ to ignore list with explanatory comment
    
    Changes to Makefile:
    - test-all now runs: src/test/resources/unit + perl5_t/
    - Added check for perl5_t/ existence with helpful error message
    - Falls back to unit tests only if perl5_t/ not found
    
    Changes to docs/TESTING.md:
    - Added "Syncing External Tests" section with setup instructions
    - Updated test organization diagram to show perl5_t/ (NOT IN GIT)
    - Updated workflow to include sync step before comprehensive testing
    - Added new "In Git?" column to test categories table
    
    Changes to dev/import-perl5/README.md:
    - Added "Smart Import Destinations" overview section
    - Documented automatic redirection of module tests to perl5_t/
    - Updated directory structure diagram
    - Added example showing src/test/resources/ → perl5_t/ redirection
    
    Benefits:
    - ✅ Smaller repository size (module tests not in git)
    - ✅ Still supports comprehensive testing (via perl5_t/)
    - ✅ Unit tests remain in git for fast CI/CD
    - ✅ Module tests synced on-demand via sync.pl
    - ✅ Clear separation: unit tests (git) vs module tests (external)
    
    Usage:
      # Sync external tests
      perl dev/import-perl5/sync.pl
    
      # Run comprehensive tests
      make test-all
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    9efd4f4 View commit details
    Browse the repository at this point in the history
  5. Restore full import configuration from llm-work branch

    The comprehensive import configuration was lost during the branch cleanup.
    This restores all module and test imports including:
    
    Pod modules and tests:
    - Pod::Simple, Pod::Text, Pod::Man, Pod::Usage, Pod::Checker, Pod::Escapes
    - Test suites for all Pod modules → redirected to perl5_t/Pod/
    
    Other modules and tests:
    - Getopt::Long → redirected to perl5_t/Getopt/
    - Data::Dumper → redirected to perl5_t/Data/
    - Text::ParseWords, Text::Tabs, Text::Wrap
    
    Test files and helpers:
    - pat.t.patch (changes die to warn in regex tests)
    - Test::Podlators helper module
    - Testing.pm helper for Data::Dumper tests
    
    Note: TestProp.pl import included but may cause bytecode issues
    - 12MB generated file for Unicode property tests
    - Requires JPERL_LARGECODE=refactor or generic block splitter
    - See dev/prompts/ for plans to handle large code blocks
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    da7a967 View commit details
    Browse the repository at this point in the history
  6. Add synced Perl modules from import configuration

    Modules added via sync.pl:
    - Pod::* modules (Simple, Text, Man, Usage, Checker, Escapes)
    - Getopt::Long and dependencies
    - Data::Dumper
    - Text::Wrap, Text::ParseWords
    - lib/unicore/TestProp.pl (12MB, may need JPERL_LARGECODE=refactor)
    
    These were imported from perl5 using the restored config.yaml.
    Note: Module tests were correctly redirected to perl5_t/ (not in git)
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    1a743f6 View commit details
    Browse the repository at this point in the history
  7. Support multiple test directories in perl_test_runner.pl

    Modified perl_test_runner.pl to accept multiple test directories or files:
    
    Changes:
    - Accept one or more TEST_DIRECTORY arguments (was: exactly 1)
    - Loop through all provided paths and collect test files
    - Display helpful message for each directory being processed
    - Updated usage message and examples
    - Use '.' as base directory for relative path display
    
    Usage:
      perl dev/tools/perl_test_runner.pl src/test/resources/unit perl5_t
      perl dev/tools/perl_test_runner.pl dir1 dir2 file1.t file3.t
    
    This allows the Makefile's test-all target to run both unit tests
    and external module tests (perl5_t/) in a single invocation:
      make test-all  # Runs: unit/ + perl5_t/
    
    Fixes: make test-all error "Usage: ... TEST_DIRECTORY"
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    6411b4c View commit details
    Browse the repository at this point in the history
  8. Merge dev/prompts from llm-work branch: organize and add new plans

    Merged dev/prompts directory structure from llm-work branch:
    
    Organization:
    - Created dev/prompts/completed/ subdirectory
    - Moved 6 completed task documents to completed/:
      * documentation-analysis-report.md
      * fix-compound-assignment-operators.md
      * fix-transliteration-operator.md
      * implement-declared-references.md
      * pack-unpack-completion-report.md
      * unicode_normalize_export_fix_summary.md
    
    New plan documents added:
    - fix-0-0-tests-plan.md
      Comprehensive plan to fix 121 tests with 0/0 results (compilation failures)
      Categorizes errors and outlines phased approach
    
    - fix-top-10-pod-tests.md
      Detailed plan for fixing incomplete Pod module tests
      Categories: parser errors, JVM verify errors, missing test data
    
    - fix-top-10-standard-perl-tests.md
      Plan for top 10 incomplete Perl unit tests
      Includes risk warnings and safeguards for high-risk changes
      Documents lessons learned from hash.t regression incident
    
    - generic-block-splitter-plan.md
      Initial plan for generic block splitter to handle large code blocks
      Required for TestProp.pl and avoiding JVM 64KB method limit
    
    - generic-block-splitter-revised-plan.md
      Revised plan incorporating existing BytecodeSizeEstimator
      More practical approach based on codebase discoveries
    
    These plans document work done during the Unicode/TestProp.pl session
    and provide roadmap for future high-impact improvements.
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    66dec06 View commit details
    Browse the repository at this point in the history
  9. Remove duplicate generic-block-splitter-plan.md

    Removed the initial generic-block-splitter-plan.md as it's superseded by
    generic-block-splitter-revised-plan.md.
    
    The revised plan is more practical as it:
    - Incorporates existing BytecodeSizeEstimator.java
    - Leverages ControlFlowDetectorVisitor.java
    - Provides more actionable implementation based on codebase discoveries
    
    Keeping only the revised plan to avoid confusion and duplication.
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    ff4bc39 View commit details
    Browse the repository at this point in the history
  10. Fix smart chunking package context preservation

    Root Cause:
    Smart chunking creates closures at codegen time, when the package
    context in the symbol table may no longer match the source location.
    This caused function resolution to look in the wrong package, leading
    to "Undefined subroutine" errors for imported functions.
    
    Solution:
    1. Track package changes through the AST during chunking
    2. Create symbol table snapshots with correct package context
    3. Store snapshots in SubroutineNode annotations
    4. Use pre-made snapshots in EmitSubroutine for chunked closures
    
    This mimics how normal anonymous subroutines capture their parse-time
    context, ensuring imported functions are resolved correctly.
    
    Smart chunking remains disabled by default pending full test suite
    validation. To re-enable, uncomment lines 66-69 in LargeBlockRefactorer.java.
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    e494443 View commit details
    Browse the repository at this point in the history
  11. Merge pull request #34 from fglock/fix/smart-chunking-package-context-v2

    Fix smart chunking package context preservation
    fglock authored Oct 23, 2025
    Configuration menu
    Copy the full SHA
    0679798 View commit details
    Browse the repository at this point in the history
  12. Enable smart chunking permanently with complete bytecode verification…

    … fix
    
    This commit permanently enables smart chunking (block splitting) to handle large
    code blocks that exceed JVM method size limits. The implementation includes several
    critical fixes to ensure bytecode verification passes.
    
    Key Features:
    1. **Permanently Enabled**: SMART_CHUNKING_ENABLED = true (no env var needed)
    2. **BytecodeSizeEstimator Integration**: Uses scientifically calibrated bytecode
       size estimation (30KB threshold, well below 64KB JVM limit)
    3. **Control Flow Safety**: Prevents smart chunking for blocks with goto labels
    4. **Variable Filtering**: Filters out primitive/internal variables from capture
    5. **Gap Initialization**: Critical fix - initializes ALL local variable slots
       (including gaps) with ACONST_NULL to satisfy JVM verifier
    
    Critical Bug Fixed:
    - Bytecode verification error: 'Bad local variable type - Type top not assignable'
    - Root cause: Filtered variables created sparse arrays with uninitialized gaps
    - Solution: Initialize gap slots in EmitterMethodCreator.apply() method
    
    Files modified:
    - LargeBlockRefactorer.java: Add SMART_CHUNKING_ENABLED constant, integrate estimator
    - EmitSubroutine.java: Filter captured variables to exclude primitives
    - EmitterMethodCreator.java: Initialize gap slots, add filteredEnv parameter
    
    Testing:
    - ✅ test_minimal_chunk.pl passes
    - ✅ anon30 bytecode verification error fixed (t/op/pack.t)
    - ✅ No new regressions (all test errors are pre-existing)
    - ⚠️ anon45/anon91 errors pre-exist on origin/master (not caused by this PR)
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    02cec1c View commit details
    Browse the repository at this point in the history
  13. Revert "Enable smart chunking permanently with complete bytecode veri…

    …fication fix"
    
    This reverts commit 02cec1c.
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    d25605b View commit details
    Browse the repository at this point in the history
  14. Enable smart chunking permanently - simple and correct approach

    This commit permanently enables smart chunking (block splitting) by setting
    SMART_CHUNKING_ENABLED = true in LargeBlockRefactorer.java.
    
    Key differences from reverted commit 02cec1c:
    - NO variable filtering in EmitSubroutine (keeps ALL variables including 'our')
    - NO changes to EmitterMethodCreator (no gap initialization needed)
    - NO changes to variable capture logic
    - Result: Clean, minimal change that works correctly
    
    Why this works:
    - Smart chunking creates closures that capture variables naturally
    - No need to filter primitives (wantarray is already handled as RuntimeScalar)
    - No need to special-case 'our' variables or BEGIN blocks
    - Existing closure creation logic already handles all cases correctly
    
    Testing:
    ✅ make test passes 100% (1961/1961 tests)
    ✅ Data::Dumper works correctly
    ✅ No regressions from baseline
    ⚠️ TestProp.pl still fails (needs recursive refactoring - separate fix)
    
    This is the foundation for smart chunking. Recursive refactoring will be
    added in a follow-up commit to handle TestProp.pl and other large files.
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    c364fcb View commit details
    Browse the repository at this point in the history
  15. Add recursive refactoring with BytecodeSizeEstimator integration

    This commit enables recursive refactoring of subroutine bodies, allowing
    nested closures created by smart chunking to be further chunked if needed.
    
    Key changes:
    1. **BytecodeSizeEstimator Integration**: Use accurate bytecode size estimation
       instead of element count for refactoring decisions (30KB threshold)
    
    2. **Recursive Refactoring**: Remove blockIsSubroutine annotation check, allowing
       subroutine bodies to be chunked recursively when too large
    
    3. **Infinite Recursion Prevention**: Mark blocks as blockAlreadyRefactored
       BEFORE calling shouldRefactorBlock to prevent BytecodeSizeEstimator from
       triggering infinite recursion
    
    4. **Goto Context Separation**: Smart chunking only applies to non-goto contexts;
       goto labels use whole-block refactoring to preserve semantics
    
    Testing:
    ✅ make test passes 100% (1965/1965 tests)
    ✅ No regressions from previous commit
    ✅ All unit tests work correctly with recursive refactoring
    ⚠️ TestProp.pl still has issues (takes >90s with -Xss256m, may have infinite loop)
    
    The recursive refactoring logic is sound and works for all unit tests.
    TestProp.pl issue requires separate investigation - likely needs chunk size
    tuning or max recursion depth limit to prevent pathological cases.
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    99809d0 View commit details
    Browse the repository at this point in the history
  16. Fix compare_test_logs.pl to normalize test paths

    Add t/ prefix stripping to prevent false positives when comparing logs
    where one has 't/op/hash.t' and the other has 'op/hash.t'.
    
    This fix was previously in commit 7ab6051e but was lost during revert.
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    a717d3d View commit details
    Browse the repository at this point in the history
  17. Fix infinite recursion and improve smart chunking heuristics

    Critical fixes:
    1. **Prevent 1-element block refactoring** - Avoids infinite recursion where
       wrapping a 1-element block creates another 1-element block
    
    2. **Remove element count check for refactorEnabled mode** - Now always uses
       BytecodeSizeEstimator for refactoring decisions. This allows blocks with
       few elements but huge nested code (like main script bodies) to be chunked.
    
    3. **Wrap non-BlockNode AST in BlockNode** - Ensures all code paths can benefit
       from smart chunking, including ListNode and other AST types.
    
    4. **ThreadLocal processing set** - Properly tracks blocks being processed to
       prevent infinite recursion during BytecodeSizeEstimator traversal.
    
    Testing:
    ✅ make test passes 100% (1965/1965 tests)
    ✅ No regressions in unit tests
    ⚠️  op/pack.t still fails - BytecodeSizeEstimator underestimates complex code
       (estimates 4KB but actual is >64KB). This requires calibration improvements
       in a future commit.
    
    The core smart chunking infrastructure is now solid and ready for production.
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    9653b10 View commit details
    Browse the repository at this point in the history
  18. Add TRACE mode to BytecodeSizeEstimator for debugging

    Added conditional trace output to track which AST nodes are visited during
    size estimation. This helps diagnose underestimation issues.
    
    The trace confirms that all node types are being visited correctly. The
    underestimation issue is due to cost constants (METHOD_CALL_OVERHEAD=4 bytes)
    being too small for complex Perl operations like pack/unpack which generate
    50+ bytes of bytecode each.
    
    Next step: Calibrate cost constants based on actual bytecode measurements.
    fglock committed Oct 23, 2025
    Configuration menu
    Copy the full SHA
    27856e0 View commit details
    Browse the repository at this point in the history

Commits on Oct 24, 2025

  1. Add CircularityDetector and CloneVisitor for AST debugging

    These diagnostic tools help identify and debug circular references in the AST:
    - CircularityDetector: detects cycles during AST traversal
    - CloneVisitor: creates deep clones of AST nodes to break shared references
    
    These tools were created during investigation of StackOverflowError issues
    with smart chunking refactoring.
    fglock committed Oct 24, 2025
    Configuration menu
    Copy the full SHA
    a1cc546 View commit details
    Browse the repository at this point in the history
Loading