Add Defects4C dataset and wrappers (Child Issue 2) by clean6378-max-it · Pull Request #25 · cppalliance/coding-benchmark

clean6378-max-it · 2026-03-04T09:22:57Z

closes #11

New Features
- Integrated Defects4C benchmark support: setup, checkout, test execution, dataset download, short-ID management, and a Dockerized evaluation environment.
Documentation
- Added README and comprehensive methodology/replication guidance for using the benchmark and workflows.
Chores
- Updated ignore rules to exclude Defects4C dataset and related workspace artifacts.

Summary by CodeRabbit

New Features
- Added Defects4C tooling: dataset download and catalog generation, per-project short-ID assignment, bug checkout to a workdir, an automated test runner, and a reproducible evaluation environment with Docker and setup scripts.
Documentation
- Added comprehensive README, methodology, and usage guides covering dataset workflow, bug ID formats, Docker usage, test/run workflow, and replication guidance.

coderabbitai · 2026-03-04T09:23:18Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a Defects4C integration: documentation, Docker environment, dataset assembly and catalog tooling, bug checkout and test-run wrappers, and setup scripts. All additions are new files; no existing public APIs were modified.

Changes

Cohort / File(s)	Summary
Docs & Methodology `external/defects4c/README.md`, `external/defects4c/docs/methodology.md`	New README and methodology docs describing setup, workflow, file layout, Docker usage, bug ID formats, and replication plans.
Docker / Environment `external/defects4c/docker/Dockerfile`	New Dockerfile (Ubuntu 22.04) installing build and Python tooling, preparing `/app`, and setting configurable work dir.
Evaluation runner `external/defects4c/evaluation/run_tests.py`	New test-run wrapper that resolves bugs from `data/bug_catalog.json`, optionally runs `build_cmd`, executes `test_cmd` with timeout and trusted-shell options, and returns test exit codes.
Dataset assembly `external/defects4c/scripts/download_dataset.py`, `external/defects4c/scripts/add_short_ids.py`	Downloader that queries GitHub and builds `data/bug_catalog.json` (resolves buggy/fixed commits) and a helper to assign per-project short IDs (PROJECT-N).
Checkout & catalog helpers `external/defects4c/scripts/checkout_bug.py`, `external/defects4c/scripts/catalog.py`	Checkout utility to clone/fetch repos and checkout buggy/fixed commits with validation and repo URL resolution; catalog loader/finder for bug lookup.
Setup script `external/defects4c/scripts/setup.sh`	Setup script to clone upstream Defects4C, create data/repos dirs, attempt dataset download, and validate `bug_catalog.json`.

Sequence Diagram(s)

sequenceDiagram
  participant User as User
  participant Checkout as checkout_bug.py
  participant Catalog as bug_catalog.json
  participant GitRemote as Git Remote
  participant Repo as Local Repo (workdir)
  participant Runner as run_tests.py

  User->>Checkout: python checkout_bug.py --bug-id PROJECT-1 [--fixed]
  Checkout->>Catalog: load_catalog(data/bug_catalog.json)
  Checkout->>GitRemote: git clone / git fetch (repo_url)
  GitRemote-->>Repo: repo cloned/fetched
  Checkout->>Repo: git checkout <sha>
  User->>Runner: python run_tests.py --bug-id PROJECT-1 [--build-first]
  Runner->>Catalog: find_bug(...)
  Runner->>Repo: optionally run build_cmd
  Runner->>Repo: run test_cmd
  Repo-->>Runner: test output / exit code
  Runner-->>User: return code / status

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Add Defects4C dataset and wrappers (Child Issue 2) #15 — Implements the same external/defects4c integration (scripts, evaluation runner, Dockerfile, and docs) and touches overlapping files.

Suggested reviewers

timon0305
iTinkerBell

Poem

🐰 I hop through repos and sniff each commit,
I fetch the bugs and stitch the catalog neat.
I clone, I build, I run tests with delight,
Versioned carrots in rows — the dataset's complete!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 61.54% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title 'Add Defects4C dataset and wrappers (Child Issue 2)' clearly describes the main change—integrating Defects4C benchmark with dataset and wrapper tooling.
Linked Issues check	✅ Passed	The PR delivers all required components from issue `#11`: setup.sh, bug_catalog.json structure, checkout_bug.py, run_tests.py, Dockerfile, and methodology.md, meeting all coding-related acceptance criteria.
Out of Scope Changes check	✅ Passed	All changes focus on Defects4C integration—scripts, data, Docker, and documentation—with no unrelated modifications to other systems or unintended scope creep.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chen/child-issue-2-defects4c-restore

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

external/defects4c/evaluation/run_tests.py (1)

22-38: Extract load_catalog/find_bug into a shared utility module.

These helpers are duplicated with external/defects4c/scripts/checkout_bug.py and will drift over time.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@external/defects4c/evaluation/run_tests.py` around lines 22 - 38, The
functions load_catalog and find_bug are duplicated elsewhere; extract them into
a single shared utility module (e.g., create a new module named
defects4c_catalog_utils or similar) and move the implementations of load_catalog
and find_bug there, then replace the local definitions with imports of
load_catalog and find_bug where used (ensure callers still receive the same
signatures and behavior, including the FileNotFoundError and the short-form
matching logic). Update any import statements in modules that previously defined
or used these helpers to import from the new shared module and remove the
duplicated definitions to avoid drift.

external/defects4c/scripts/download_dataset.py (1)

210-215: Stabilize short-id assignment with deterministic ordering.

version values depend on remote fetch order today, so PROJECT-1 can drift between runs.

♻️ Proposed deterministic ordering before numbering

-    version_per_project = defaultdict(int)
-    for b in catalog["bugs"]:
+    catalog["bugs"].sort(
+        key=lambda b: (b.get("project", ""), b.get("fixed_commit", ""), b.get("buggy_commit", ""))
+    )
+    version_per_project = defaultdict(int)
+    for b in catalog["bugs"]:
         p = b.get("project", "")
         version_per_project[p] += 1
         b["version"] = version_per_project[p]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@external/defects4c/scripts/download_dataset.py` around lines 210 - 215, The
per-project short-id assignment is nondeterministic because it iterates
catalog["bugs"] in remote fetch order; to stabilize it, sort the bug list first
and then assign versions: build a sorted sequence (e.g., sorted_bugs =
sorted(catalog["bugs"], key=lambda b: (b.get("project",""), b.get("id",
b.get("bug_id", b.get("commit","")))))) and then iterate sorted_bugs when
updating version_per_project and setting b["version"]; keep the same
version_per_project logic but replace the unsorted loop over catalog["bugs"] so
PROJECT-1, PROJECT-2, ... are assigned deterministically.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@external/defects4c/docker/Dockerfile`:
- Line 5: Update the Docker run example comment so the host volume mounts the
repo directory inside the current defects4c folder instead of its parent;
replace the incorrect "$(pwd)/../repos" with "$(pwd)/repos" in the run command
example so the container sees the external/defects4c/repos directory when run
from the defects4c directory.
- Around line 29-34: The Dockerfile currently runs as root; create and switch to
a non-root user to reduce blast radius: add a user/group (e.g., defects4c) and
set a fixed UID/GID, chown the DEFECTS4C_WORK_DIR and /app to that user, set
HOME if needed, and add USER defects4c before the final CMD; update references
to ENV DEFECTS4C_WORK_DIR, RUN mkdir -p "$DEFECTS4C_WORK_DIR", WORKDIR /app and
CMD ["bash"] so the container executes as the non-root user.

In `@external/defects4c/docs/methodology.md`:
- Line 15: Update the documentation text to reflect the actual catalog schema:
replace the incorrect reference to `build_system` with the correct `build_cmd`
(the tooling writes `build_cmd` and no `build_system`), and state that
per-project `build_cmd` entries appear in `bug_catalog.json` under
`projects_info` and per-bug records; keep the rest of the sentence about
supported build systems (Make, CMake, Autotools) intact.

In `@external/defects4c/evaluation/run_tests.py`:
- Around line 83-92: The script runs catalog-provided commands unsafely
(subprocess.run(build_cmd/test_cmd, shell=True) with no timeout) and duplicates
validation logic across files; change execution to require an explicit trust
flag (e.g., --trusted) before allowing shell=True, otherwise pass commands as
argument lists or run with shell=False and a safe argv list and always supply a
timeout (e.g., timeout=60); refactor duplicated functions load_catalog() and
find_bug() into a shared module (e.g., defects4c.catalog) and import them in
both run_tests.py and checkout_bug.py to remove the duplicated logic and
centralize path-traversal validation.

In `@external/defects4c/scripts/checkout_bug.py`:
- Around line 45-48: The is_valid_repo_url function currently accepts plain
"http://" which is insecure; update is_valid_repo_url to reject "http://" and
only accept "https://" and "git@" forms, and update the docstring to reflect the
new policy; specifically change the validation logic in is_valid_repo_url to
remove the u.startswith("http://") branch so only u.startswith("https://") or
u.startswith("git@") return True.
- Around line 102-111: The code currently takes the user/catalog value in
variable commit (from bug.get("fixed_commit") or bug.get("buggy_commit")) and
passes it directly to git checkout; add validation that commit matches a SHA
pattern before proceeding (e.g. regex r'^[0-9a-fA-F]{7,40}$' to allow 7..40 hex
chars or require 40 for full SHA1), and if it fails print a clear stderr message
and sys.exit(1). Apply this check immediately after commit is set in both places
where commit is assigned (the branches using args.fixed and the other occurrence
further down) so only validated SHAs are used for the git checkout operation.

In `@external/defects4c/scripts/setup.sh`:
- Around line 29-34: The current setup suppresses download errors by appending
"|| true" to the python3 "$DOWNLOAD_SCRIPT" invocation, which can leave a stale
bug_catalog.json while reporting success; remove the "|| true" and make the
script fail fast instead (either rely on existing errexit or explicitly check
the exit status of the python3 run and call exit 1 on failure) so failures in
running download_dataset.py (DOWNLOAD_SCRIPT) are propagated and cause the setup
to stop.

---

Nitpick comments:
In `@external/defects4c/evaluation/run_tests.py`:
- Around line 22-38: The functions load_catalog and find_bug are duplicated
elsewhere; extract them into a single shared utility module (e.g., create a new
module named defects4c_catalog_utils or similar) and move the implementations of
load_catalog and find_bug there, then replace the local definitions with imports
of load_catalog and find_bug where used (ensure callers still receive the same
signatures and behavior, including the FileNotFoundError and the short-form
matching logic). Update any import statements in modules that previously defined
or used these helpers to import from the new shared module and remove the
duplicated definitions to avoid drift.

In `@external/defects4c/scripts/download_dataset.py`:
- Around line 210-215: The per-project short-id assignment is nondeterministic
because it iterates catalog["bugs"] in remote fetch order; to stabilize it, sort
the bug list first and then assign versions: build a sorted sequence (e.g.,
sorted_bugs = sorted(catalog["bugs"], key=lambda b: (b.get("project",""),
b.get("id", b.get("bug_id", b.get("commit","")))))) and then iterate sorted_bugs
when updating version_per_project and setting b["version"]; keep the same
version_per_project logic but replace the unsorted loop over catalog["bugs"] so
PROJECT-1, PROJECT-2, ... are assigned deterministically.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5d21e499-f920-497f-8ccc-113d4ba128d1

📥 Commits

Reviewing files that changed from the base of the PR and between 7a3befd and f19e571.

📒 Files selected for processing (8)

external/defects4c/README.md
external/defects4c/docker/Dockerfile
external/defects4c/docs/methodology.md
external/defects4c/evaluation/run_tests.py
external/defects4c/scripts/add_short_ids.py
external/defects4c/scripts/checkout_bug.py
external/defects4c/scripts/download_dataset.py
external/defects4c/scripts/setup.sh

…t, repo+SHA validation, setup fail-fast (Child Issue 2) fix(defects4c): Docker, docs, safety, and setup (Child Issue 2) - Dockerfile: mount $(pwd)/repos (not ../repos) when run from defects4c; add non-root user defects4c, chown /app and work dir, USER defects4c. - methodology.md: document build_cmd (not build_system) in projects_info and per-bug records. - scripts/catalog.py: add shared load_catalog and find_bug; checkout_bug and run_tests import from catalog to remove duplication. - run_tests.py: add --trusted flag; run build/test with shell=False and timeout=60 unless --trusted (then shell=True + timeout). - checkout_bug.py: is_valid_repo_url accepts only https:// and git@ (reject http://); validate commit SHA (7–40 hex) before git checkout. - setup.sh: remove "|| true" from download_dataset.py so setup fails fast on download failure.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@external/defects4c/evaluation/run_tests.py`:
- Around line 26-41: In _run_cmd, replace the brittle cmd.split() parsing with
shlex.split(cmd) to correctly handle quoted args and shell metacharacters, and
add explicit handling for subprocess.TimeoutExpired around both subprocess.run
calls so timeouts don't raise uncaught exceptions; on timeout write an
informative message to stderr (including label and timeout) and return a
non-zero exit code (e.g., 124). Also ensure shlex is imported at top if not
present. Use the existing trusted branch (subprocess.run(..., shell=True)) and
non-trusted branch (subprocess.run(argv,...)) as the locations to wrap with
try/except for TimeoutExpired.

In `@external/defects4c/scripts/checkout_bug.py`:
- Around line 109-117: The code currently validates only args.repo_url; instead,
after assigning repo_url = get_repo_url(catalog, project_raw) (and after
overriding with args.repo_url if provided), run is_valid_repo_url(repo_url) and
reject it if invalid; update the error message to state allowed forms are only
https:// or git@ (remove http://), call sys.exit(1) on failure, and ensure the
validation logic is applied whether repo_url came from get_repo_url(...) or
args.repo_url so a malformed catalog value cannot reach git clone.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8b10191c-946f-470f-a3aa-c44c3810f151

📥 Commits

Reviewing files that changed from the base of the PR and between f19e571 and 26d58a4.

📒 Files selected for processing (6)

external/defects4c/docker/Dockerfile
external/defects4c/docs/methodology.md
external/defects4c/evaluation/run_tests.py
external/defects4c/scripts/catalog.py
external/defects4c/scripts/checkout_bug.py
external/defects4c/scripts/setup.sh

🚧 Files skipped from review as they are similar to previous changes (2)

external/defects4c/docs/methodology.md
external/defects4c/scripts/setup.sh

…url from catalog and args in checkout_bug fix(defects4c): safer command parsing and repo_url validation - run_tests.py: use shlex.split(cmd) for quoted args; catch subprocess.TimeoutExpired, log timeout to stderr, and return 124. - checkout_bug.py: validate repo_url from catalog or --repo-url with is_valid_repo_url; reject invalid URLs before git clone and state only https:// or git@ allowed.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

external/defects4c/evaluation/run_tests.py (1)

24-24: Make timeout configurable instead of fixed at 60 seconds.

A hardcoded 60s timeout is likely too aggressive for Defects4C build/test workloads and can cause false negatives.

⏱️ Proposed refactor

-DEFAULT_TIMEOUT = 60
+DEFAULT_TIMEOUT = 1800
@@
-    parser.add_argument(
+    parser.add_argument(
         "--trusted",
         action="store_true",
         help="Allow executing catalog build_cmd/test_cmd with shell=True; otherwise run with shell=False and a timeout.",
     )
+    parser.add_argument(
+        "--timeout-sec",
+        type=int,
+        default=DEFAULT_TIMEOUT,
+        help="Timeout in seconds for each build/test command.",
+    )
@@
-        rc = _run_cmd(build_cmd, project_dir, args.trusted, "Build")
+        rc = _run_cmd(build_cmd, project_dir, args.trusted, "Build", args.timeout_sec)
@@
-    rc = _run_cmd(test_cmd, project_dir, args.trusted, "Test")
+    rc = _run_cmd(test_cmd, project_dir, args.trusted, "Test", args.timeout_sec)

Also applies to: 54-63, 98-104

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@external/defects4c/evaluation/run_tests.py` at line 24, Replace the hardcoded
DEFAULT_TIMEOUT = 60 with a configurable value: read a timeout from an
environment variable or CLI flag and fall back to 60 if not provided (e.g.,
parse TIMEOUT or --timeout into an integer), and have all uses of
DEFAULT_TIMEOUT (and any references in the blocks around the previous
occurrences at the other locations) consume this configured variable instead of
the literal 60; update the name DEFAULT_TIMEOUT to reflect it’s configurable
(e.g., DEFAULT_TIMEOUT or timeout_seconds) and validate/convert the input to an
int with a sensible lower/upper bound before use.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@external/defects4c/evaluation/run_tests.py`:
- Around line 32-46: The _run_cmd implementation presently only catches
subprocess.TimeoutExpired, so FileNotFoundError/OSError from subprocess.run
(when shell=False) will crash; wrap both subprocess.run calls (the trusted
shell=True branch and the argv branch using shlex.split) in exception handling
that catches FileNotFoundError and OSError, print a concise error to stderr
including the label and the exception message (e.g., f"{label} failed to start:
{e}"), and return a non-zero exit code (use 127 for command-not-found/startup
failures) instead of letting the exception propagate; reference
variables/functions to change: _run_cmd, subprocess.run, trusted, argv, label,
DEFAULT_TIMEOUT, cwd.

In `@external/defects4c/scripts/checkout_bug.py`:
- Around line 58-60: The code unconditionally calls load_catalog(data_dir) which
throws if bug_catalog.json is missing and prevents the direct PROJECT@SHA
fallback; update checkout_bug.py so load_catalog is called defensively (either
check for the catalog file's existence or catch exceptions from load_catalog)
and only attempt find_bug(catalog, args.bug_id) when a valid catalog was loaded;
if no catalog is available, skip the find_bug path and proceed to the existing
direct PROJECT@SHA parsing logic (the code around load_catalog, find_bug, and
args.bug_id) so the advertised direct mode works when bug_catalog.json is
absent.
- Around line 121-126: The git subprocess invocations (the direct subprocess.run
calls for clone, fetch, and checkout in checkout_bug.py) lack timeouts and
user-friendly error handling; create a helper function (e.g., run_git(cmd_args,
cwd, timeout=300)) that calls subprocess.run([...], check=True, cwd=cwd,
timeout=timeout) and wraps it in try/except for subprocess.TimeoutExpired and
subprocess.CalledProcessError to print/ log clear CLI messages and exit with a
non-zero code; replace the three raw subprocess.run calls for "git clone", "git
fetch origin", and "git checkout" with calls to run_git so stalled or failing
git operations are time-bounded and surfaced gracefully.

---

Nitpick comments:
In `@external/defects4c/evaluation/run_tests.py`:
- Line 24: Replace the hardcoded DEFAULT_TIMEOUT = 60 with a configurable value:
read a timeout from an environment variable or CLI flag and fall back to 60 if
not provided (e.g., parse TIMEOUT or --timeout into an integer), and have all
uses of DEFAULT_TIMEOUT (and any references in the blocks around the previous
occurrences at the other locations) consume this configured variable instead of
the literal 60; update the name DEFAULT_TIMEOUT to reflect it’s configurable
(e.g., DEFAULT_TIMEOUT or timeout_seconds) and validate/convert the input to an
int with a sensible lower/upper bound before use.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e9785125-0d8a-4c3b-89c7-6f81dfb69a8e

📥 Commits

Reviewing files that changed from the base of the PR and between 26d58a4 and 8910173.

📒 Files selected for processing (2)

external/defects4c/evaluation/run_tests.py
external/defects4c/scripts/checkout_bug.py

…g, run_git (Child Issue 2) - run_tests.py: Docstring default is "make check" only (remove ctest). - run_tests.py: In _run_cmd, catch FileNotFoundError/OSError; print "{label} failed to start: {e}", return 127. - checkout_bug.py: Load catalog in try/except (FileNotFoundError, OSError, JSONDecodeError); if load fails set catalog={}, bug=None so direct PROJECT@SHA mode works when bug_catalog.json is absent. - checkout_bug.py: Add run_git(cmd_args, cwd, timeout=300) with TimeoutExpired/CalledProcessError handling; use it for clone, fetch, checkout instead of raw subprocess.run.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@external/defects4c/evaluation/run_tests.py`:
- Around line 68-74: Wrap the call to load_catalog/find_bug in a try/except so
catalog load errors don't abort direct-mode fallback: call
load_catalog(data_dir) inside a try block, on exception (or when load_catalog
returns None) fall back to the existing direct PROJECT@SHA logic that constructs
bug = {"bug_id": args.bug_id, "project": project, "test_cmd":"make check",
"build_cmd": ""}; ensure this handling is applied before the final if not bug
exit path so the fallback runs even if load_catalog raises. Use the existing
symbols load_catalog, find_bug, args.bug_id and the created bug dict to
implement the fix.
- Line 97: The fallback for test_cmd currently applies .strip() to
bug.get("test_cmd") and treats a whitespace-only string as a non-default value;
update the logic in run_tests.py so you first retrieve the raw value (e.g., raw
= bug.get("test_cmd")), then if raw is None or raw.strip() == "" set test_cmd to
the default "make check", otherwise set test_cmd to raw.strip(); locate the
assignment to test_cmd in the function or scope where test commands are prepared
and replace it with this conditional selection.

In `@external/defects4c/scripts/checkout_bug.py`:
- Around line 43-52: The run_git function currently only catches TimeoutExpired
and CalledProcessError but not process startup failures; update run_git to also
catch FileNotFoundError (and optionally OSError) around the subprocess.run call
and handle it by printing a clear error to stderr (e.g., "git ...: not found or
not executable: {e}") and calling sys.exit(1); reference the run_git function
and the subprocess.run invocation so you add the new except FileNotFoundError
(and/or except OSError as a fallback) branch mirroring the existing error
handling behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9a3b0266-978f-4bfb-a574-37504a8f3f88

📥 Commits

Reviewing files that changed from the base of the PR and between 8910173 and c31fa49.

📒 Files selected for processing (2)

external/defects4c/evaluation/run_tests.py
external/defects4c/scripts/checkout_bug.py

…y/except fallback fix(defects4c): test_cmd fallback, run_git exceptions, catalog load fallback - run_tests: treat None or whitespace-only test_cmd as missing; use raw = bug.get("test_cmd") and default to "make check" when raw is None or str(raw).strip() == "" - checkout_bug: in run_git, catch FileNotFoundError and OSError around subprocess.run; print clear error to stderr and sys.exit(1) - run_tests: wrap load_catalog/find_bug in try/except so catalog load errors (FileNotFoundError, OSError, JSONDecodeError) set bug = None and direct PROJECT@SHA fallback still runs

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@external/defects4c/evaluation/run_tests.py`:
- Around line 68-79: When load_catalog(args.data_dir) or find_bug(...) throws
(caught by the except block), preserve the exception object (e.g., save it as
load_err) instead of discarding it; then, after the PROJECT@SHA fallback logic
that builds a synthetic bug for IDs containing "@", if no bug was found and the
original failure was a catalog load error (load_err is set) emit that error to
stderr (include load_err details and a clear message) and exit, otherwise
continue with the existing "Bug not found" behavior. Update the try/except
around load_catalog/find_bug and the subsequent if not bug checks to reference
load_err and args.bug_id so the real catalog load error is surfaced when
applicable.
- Around line 99-104: The code calls .strip() on build_cmd without ensuring it's
a string, which crashes for non-string values; mirror the existing pattern used
for test_cmd: retrieve raw_build = bug.get("build_cmd"), then set build_cmd =
"make <...>"? (keep current behavior) — specifically, replace the current
build_cmd = (bug.get("build_cmd") or "").strip() with logic like raw_build =
bug.get("build_cmd"); if raw_build is None or str(raw_build).strip() == "":
build_cmd = "" (or the intended default); else build_cmd =
str(raw_build).strip(); update references to build_cmd accordingly so non-string
catalog values are coerced before calling .strip().

In `@external/defects4c/scripts/checkout_bug.py`:
- Around line 98-106: Ensure project_raw is a non-empty string before calling
sanitize_project: check that bug.get("project") returns an instance of str and
that project_raw.strip() is not empty, otherwise print a clear error to stderr
and sys.exit(1); then call sanitize_project(project_raw) inside the existing
try/except for ValueError to handle sanitization errors. Use the existing
variable names project_raw, sanitize_project, and project to locate and update
the logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2d5bcb99-ee97-44de-a4a9-b3f8ef070088

📥 Commits

Reviewing files that changed from the base of the PR and between c31fa49 and dcb884d.

📒 Files selected for processing (2)

external/defects4c/evaluation/run_tests.py
external/defects4c/scripts/checkout_bug.py

…alidation (Child Issue 2)

Retore child issue 2 files

f19e571

coderabbitai Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread external/defects4c/evaluation/run_tests.py Outdated

Comment thread external/defects4c/scripts/checkout_bug.py

coderabbitai Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread external/defects4c/evaluation/run_tests.py Outdated

Comment thread external/defects4c/evaluation/run_tests.py

Comment thread external/defects4c/scripts/checkout_bug.py Outdated

Comment thread external/defects4c/scripts/checkout_bug.py Outdated

coderabbitai Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread external/defects4c/evaluation/run_tests.py Outdated

Comment thread external/defects4c/evaluation/run_tests.py Outdated

Comment thread external/defects4c/scripts/checkout_bug.py

coderabbitai Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread external/defects4c/evaluation/run_tests.py

Comment thread external/defects4c/evaluation/run_tests.py Outdated

Comment thread external/defects4c/scripts/checkout_bug.py

fix(defects4c): load_err surfacing, build_cmd coercion, project_raw v…

3b18821

…alidation (Child Issue 2)

Conversation

clean6378-max-it commented Mar 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clean6378-max-it commented Mar 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 4, 2026 •

edited

Loading