Skip to content

Commit 6e8dd99

Browse files
committed
Improvements/Benchmark
1 parent 1ead191 commit 6e8dd99

File tree

5 files changed

+461
-275
lines changed

5 files changed

+461
-275
lines changed

README.md

Lines changed: 108 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -2,79 +2,141 @@
22

33
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
44

5-
A multi-layered heuristic engine designed to practically analyze the halting properties of Python scripts, navigating the complexities of the undecidable Halting Problem.
5+
A multi-layered heuristic engine designed to practically analyze the halting properties of Python scripts. This project navigates the complexities of the undecidable Halting Problem not by attempting a perfect theoretical solution, but by implementing a robust, defense-in-depth strategy that is demonstrably effective.
66

7-
## The Problem: The Halting Problem
7+
When tested against a benchmark suite of **5,498 files**—including the Python standard library, top PyPI packages, and a gauntlet of adversarial paradoxes—this analyzer achieved a **Practical Success Rate of 88.87%**.
88

9-
In 1936, Alan Turing proved that it is impossible to create a universal algorithm that can determine, for all possible programs, whether they will finish running (halt) or continue to run forever. No perfect, general-purpose solution can ever exist.
9+
## Features
1010

11-
This project does not attempt to "solve" the Halting Problem. Instead, it provides a practical, multi-phase heuristic approach to analyze Python code, successfully identifying halting and non-halting behavior in a wide range of real-world and adversarial scenarios.
11+
- **Quantifiable High Performance:** Achieves a high success rate on a large and diverse corpus of real-world and adversarial code.
12+
- **Multi-Phase Analysis Pipeline:** Employs a cascade of analysis techniques, from lightweight static checks to full dynamic execution, ensuring both speed and accuracy.
13+
- **Advanced Paradox & Cycle Detection:** Utilizes semantic hashing and an analysis call-chain tracker to defend against simple, obfuscated, and even polymorphic recursive paradoxes.
14+
- **Heuristic Classifier for Known Problems:** Identifies computationally intractable problems like the Ackermann function and Collatz conjecture by their structural patterns, preventing unnecessary execution.
15+
- **Symbolic Prover:** Integrates a dedicated component to prove the termination of common loop structures that are too complex for basic static analysis.
16+
- **Automated Benchmarking Suite:** Includes a powerful script (`benchmark.py`) that builds the test corpus and empirically calculates the analyzer's success rate.
17+
- **Intelligent Caching:** The benchmark harness automatically caches the downloaded code corpus, allowing for rapid re-analysis after making changes to the analyzer's logic.
1218

1319
## The Solution: A Multi-Layered Heuristic Defense
1420

15-
This analyzer employs a "defense-in-depth" strategy. It subjects a given program to a series of increasingly sophisticated and computationally expensive analysis phases. If any phase can make a definitive decision, the analysis stops, ensuring maximum efficiency.
21+
This analyzer employs a "defense-in-depth" strategy. It subjects a given program to a series of increasingly sophisticated analysis phases. If any phase can make a definitive decision, the analysis stops, ensuring maximum efficiency.
1622

1723
### Core Architecture: The Analysis Pipeline
1824

19-
The analyzer processes scripts through the following sequence:
25+
The analyzer processes each script through the following ordered pipeline:
2026

21-
#### Meta-Analysis: Cycle & Paradox Detection
22-
Before the main analysis begins, two crucial meta-checks are performed to protect the analyzer itself from paradoxical attacks.
27+
1. **Meta-Analysis: Cross-Script Recursion Detection (`cross_script_recursion`)**
28+
- Before any analysis begins, the script's code is converted to a "semantic hash." The analyzer maintains a call stack of these hashes. If it's asked to analyze a script that is already in the current analysis chain (e.g., A analyzes B, which analyzes a polymorphic version of A), it immediately concludes `does not halt` and stops.
2329

24-
1. **Semantic Hashing (`semantic_hashing.py`):** Instead of a simple lexical hash of the code, the analyzer first converts the program into a **canonical form**. This process uses an Abstract Syntax Tree (AST) transformer to rename all variables, functions, and arguments to a standard format (`func_0`, `var_0`, etc.) and remove comments. This ensures that two programs that are structurally identical but use different names will produce the **same hash**.
30+
2. **Phase 0: Adversarial Pattern Matching (`paradox_detection`)**
31+
- A highly specific AST visitor that looks for the exact structure of the classic "read-my-own-source-and-invert-the-result" paradox. If found, it returns `impossible to determine`.
2532

26-
2. **Cross-Script Cycle Detection (`cross_script_recursion.py`):** The analyzer maintains a chain of the semantic hashes of every program currently under analysis. If it is asked to analyze a script whose semantic hash is already in the chain (e.g., A analyzes B, which analyzes a cosmetically different version of A), a mutual recursion cycle is detected and the analysis is short-circuited.
33+
3. **Phase 1: Static Analysis (`static_analysis`)**
34+
- The fastest check for the most obvious cases.
35+
- **Finds `while True:`:** Immediately returns `does not halt`.
36+
- **Finds no loops AND no recursion:** Immediately returns `halts`.
2737

28-
#### Phase 0: Adversarial Pattern Matching (`paradox_detection.py`)
29-
* **Purpose:** To identify specific, known implementations of the classic halting problem paradox.
30-
* **Method:** Uses a highly specific AST visitor to look for the exact structure of a program that reads its own source, calls the analyzer on itself, and inverts the result.
38+
4. **Phase 1.5: Heuristic Classification (`heuristic_classifier`)**
39+
- An AST-based pattern matcher that identifies the structural "fingerprints" of known computationally intractable problems. It flags code that implements the **Ackermann function** or the **Collatz conjecture** as `impossible to determine` without needing to run them.
3140

32-
#### Phase 1: Static Analysis (`static_analysis.py`)
33-
* **Purpose:** The fastest check for the most obvious cases.
34-
* **Method:** Walks the AST to find definitive conditions.
35-
* **Finds `while True:`:** Immediately returns `does not halt`.
36-
* **Finds no loops AND no recursion:** Immediately returns `halts`.
37-
* **Finds loops or recursion it cannot solve:** Defers to the next phase.
41+
5. **Phase 2: Symbolic Prover (`symbolic_prover`)**
42+
- A more intelligent static phase that can prove termination for common loop patterns like `for i in range(10)` or `while x < 10: x += 1`, returning `halts` if successful.
3843

39-
#### Phase 2: Symbolic Prover (`symbolic_prover.py`)
40-
* **Purpose:** To handle common loop structures that are too complex for the basic static analyzer but can still be proven without full execution.
41-
* **Method:** Uses AST analysis to prove termination for a wider class of loops.
42-
* **Identifies `for i in range(constant)`:** Returns `halts`.
43-
* **Identifies `while var < constant:` with a clear increment (`var = var + const`):** Returns `halts`.
44+
6. **Phase 3: Dynamic Tracing (`dynamic_tracing`)**
45+
- The most powerful phase, which executes code in a monitored sandbox. It watches for tell-tale signs of non-termination, such as runaway recursion or repeating execution cycles, to determine if a script `does not halt`. If the script runs to completion or exits with a standard error, it is considered to `halt`.
4446

45-
#### Phase 3: Dynamic Tracing (`dynamic_tracing.py`)
46-
* **Purpose:** The most powerful and expensive phase. It executes the code in a monitored environment to observe its behavior directly.
47-
* **Method:**
48-
* **Blunt Check:** First checks for the literal string `"analyze_halting"` in the code, providing a fast exit for most self-referential scripts.
49-
* **Execution Tracing:** If the blunt check fails, it executes the code line by line, monitoring for:
50-
* **Infinite Recursion:** A recursion depth limit that, when exceeded, signals a non-halting state.
51-
* **Execution Trace Cycling:** Detects if the program enters a state (line number and local variables) that it has been in before, indicating a non-terminating loop.
47+
7. **Phase 4: Decision Synthesis (`decision_synthesis`)**
48+
- A final safety net. If all other phases were inconclusive, it performs a last check for self-referential calls to the analyzer and makes a final judgment.
5249

53-
## The Gauntlet: A Showcase of Defeated Paradoxes
50+
### Formal Representation of the Analyzer
5451

55-
The `/scripts` directory contains a suite of test cases designed to challenge each layer of the analyzer's defenses.
52+
The logic of the entire pipeline can be expressed as a formal system. Let be the set of all Python programs and be the set of results. The analyzer **H** is a function that takes a program and the current analysis chain **C** and is defined as:
5653

57-
* `non_halting.py`: Defeated by **Phase 1 (Static Analysis)**.
58-
* `bounded_loop.py`: Defeated by **Phase 2 (Symbolic Prover)**.
59-
* `paradox.py`: Defeated by **Phase 0 (Pattern Matching)**.
60-
* `obfuscated_paradox.py`: Defeated by **Phase 3 (Dynamic Tracing's blunt check)**.
61-
* `final_paradox.py`: Defeated by the **Cross-Script Cycle Detector** (direct `A->A` recursion).
62-
* `mutating_paradox_*.py`: Defeated by **Phase 3 (Dynamic Tracing's blunt check)**.
63-
* `semantic_paradox_A.py`: Defeated by the **Semantic Hashing + Cycle Detector** (`A->B->C(A-like)` recursion).
64-
* `polymorphic_termination_paradox.py`: The ultimate test, defeated by the **Symbolic Prover's** ability to resolve the inner dilemma, which then allows the **Dynamic Tracer** to catch the outer paradoxical payload.
54+
**H(P, C) =**
55+
```
56+
| "does not halt", if Hash(P) ∈ C
57+
|
58+
| "impossible to determine", if Paradox(P) = true
59+
|
60+
| Static(P), if Static(P) ≠ "impossible to determine"
61+
|
62+
H(P) = | "impossible to determine", if Heuristic(P) = "impossible to determine"
63+
|
64+
| Prove(P), if Prove(P) ≠ "impossible to determine"
65+
|
66+
| Trace(P), if Trace(P) ≠ "impossible to determine"
67+
|
68+
| "does not halt", if "analyze_halting" is a substring of P
69+
|
70+
| "impossible to determine", otherwise
71+
```
72+
73+
## Performance: A Benchmark-Driven Result
74+
75+
To validate this approach, a comprehensive benchmark was performed using the included `benchmark.py` script.
76+
77+
- **Corpus Size:** 5,498 total Python scripts.
78+
- **Corpus Composition:**
79+
- **Halting Code:** The Python Standard Library and top PyPI packages (`requests`, `numpy`, `pandas`, etc.).
80+
- **Non-Halting Code:** Synthetically generated infinite loops and a suite of hand-crafted adversarial paradoxes.
81+
- **Complex Code:** Theoretically challenging cases like the Ackermann function and the Collatz conjecture.
82+
- **Success Criteria:** A test passes if the analyzer's result is considered "safe" for the given category:
83+
- `halting` scripts must be classified as `halts`.
84+
- `non-halting` scripts are correct if classified as `does not halt` or `impossible to determine`.
85+
- `complex` scripts are correct if classified as `impossible to determine` or `does not halt`.
86+
87+
| Metric | Score |
88+
| ----------------------- | -------------------------------------- |
89+
| **Correct Predictions** | 4,886 of 5,498 |
90+
| **Practical Success Rate** | **88.87%** |
91+
92+
This result demonstrates that while a perfect halting decider is impossible, a layered heuristic approach can achieve a very high degree of accuracy and safety on practical, real-world code.
6593

6694
## Usage
6795

68-
To run the analysis on all test scripts, simply execute `main.py` from your terminal:
96+
The project contains two primary entry points: the analyzer itself (`main.py`) and the benchmark harness (`benchmark.py`).
97+
98+
### Running the Analyzer
99+
100+
The `main.py` script can analyze a directory of Python files. By default, it runs on the project's `./scripts` directory.
69101

70102
```bash
103+
# Analyze the default adversarial scripts
71104
python main.py
72105
```
73106

74-
The analyzer will process each file in the `/scripts` directory and print the result.
107+
You can also point it at any other directory using the `--target` flag.
108+
109+
```bash
110+
# Analyze a custom directory
111+
python main.py --target /path/to/your/scripts
112+
```
113+
114+
### Measuring Performance with the Benchmark
75115

76-
## The Never-Ending Game: Limitations and Philosophy
116+
The `benchmark.py` script builds the test corpus and calculates the analyzer's success rate.
117+
118+
**First Run (Builds the Corpus)**
119+
This command will take several minutes to download and process thousands of files into a `benchmark_suite` directory.
120+
121+
```bash
122+
python benchmark.py
123+
```
124+
125+
**Subsequent Runs (Uses Cached Corpus)**
126+
Once the `benchmark_suite` directory exists, running the command again will skip the build process and provide results much faster.
127+
128+
```bash
129+
# This run will be much faster
130+
python benchmark.py
131+
```
132+
133+
**Forcing a Fresh Build**
134+
To delete the existing corpus and build a new one, use the `--rebuild` flag.
135+
136+
```bash
137+
python benchmark.py --rebuild
138+
```
77139

78-
While this analyzer is robust, the Halting Problem remains undecidable. No set of heuristics is perfect. An adversary could, in theory, design a paradox based on a level of semantic equivalence that even the symbolic prover cannot solve (e.g., a complex mathematical calculation vs. a simple loop that both happen to run for the same number of iterations).
140+
## The Never-Ending Game: Project Philosophy
79141

80-
This project's philosophy is not to achieve theoretical perfection, but to demonstrate a practical, layered approach that pushes the boundary of what can be decided, catching increasingly sophisticated and realistic non-halting scenarios.
142+
This project acknowledges that the Halting Problem is theoretically undecidable. The goal is not to achieve impossible perfection but to build a practical tool that demonstrates the power of layered heuristics. By combining static analysis, symbolic logic, dynamic tracing, and advanced meta-defenses, this analyzer successfully pushes the boundary of what can be practically decided, providing correct and safe answers for an overwhelming majority of real-world and adversarial programs.

0 commit comments

Comments
 (0)