Add Defects4C dataset and wrappers (Child Issue 2)#25
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Defects4C integration: documentation, Docker environment, dataset assembly and catalog tooling, bug checkout and test-run wrappers, and setup scripts. All additions are new files; no existing public APIs were modified. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant Checkout as checkout_bug.py
participant Catalog as bug_catalog.json
participant GitRemote as Git Remote
participant Repo as Local Repo (workdir)
participant Runner as run_tests.py
User->>Checkout: python checkout_bug.py --bug-id PROJECT-1 [--fixed]
Checkout->>Catalog: load_catalog(data/bug_catalog.json)
Checkout->>GitRemote: git clone / git fetch (repo_url)
GitRemote-->>Repo: repo cloned/fetched
Checkout->>Repo: git checkout <sha>
User->>Runner: python run_tests.py --bug-id PROJECT-1 [--build-first]
Runner->>Catalog: find_bug(...)
Runner->>Repo: optionally run build_cmd
Runner->>Repo: run test_cmd
Repo-->>Runner: test output / exit code
Runner-->>User: return code / status
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (2)
external/defects4c/evaluation/run_tests.py (1)
22-38: Extractload_catalog/find_buginto a shared utility module.These helpers are duplicated with
external/defects4c/scripts/checkout_bug.pyand will drift over time.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@external/defects4c/evaluation/run_tests.py` around lines 22 - 38, The functions load_catalog and find_bug are duplicated elsewhere; extract them into a single shared utility module (e.g., create a new module named defects4c_catalog_utils or similar) and move the implementations of load_catalog and find_bug there, then replace the local definitions with imports of load_catalog and find_bug where used (ensure callers still receive the same signatures and behavior, including the FileNotFoundError and the short-form matching logic). Update any import statements in modules that previously defined or used these helpers to import from the new shared module and remove the duplicated definitions to avoid drift.external/defects4c/scripts/download_dataset.py (1)
210-215: Stabilize short-id assignment with deterministic ordering.
versionvalues depend on remote fetch order today, soPROJECT-1can drift between runs.♻️ Proposed deterministic ordering before numbering
- version_per_project = defaultdict(int) - for b in catalog["bugs"]: + catalog["bugs"].sort( + key=lambda b: (b.get("project", ""), b.get("fixed_commit", ""), b.get("buggy_commit", "")) + ) + version_per_project = defaultdict(int) + for b in catalog["bugs"]: p = b.get("project", "") version_per_project[p] += 1 b["version"] = version_per_project[p]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@external/defects4c/scripts/download_dataset.py` around lines 210 - 215, The per-project short-id assignment is nondeterministic because it iterates catalog["bugs"] in remote fetch order; to stabilize it, sort the bug list first and then assign versions: build a sorted sequence (e.g., sorted_bugs = sorted(catalog["bugs"], key=lambda b: (b.get("project",""), b.get("id", b.get("bug_id", b.get("commit","")))))) and then iterate sorted_bugs when updating version_per_project and setting b["version"]; keep the same version_per_project logic but replace the unsorted loop over catalog["bugs"] so PROJECT-1, PROJECT-2, ... are assigned deterministically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@external/defects4c/docker/Dockerfile`:
- Line 5: Update the Docker run example comment so the host volume mounts the
repo directory inside the current defects4c folder instead of its parent;
replace the incorrect "$(pwd)/../repos" with "$(pwd)/repos" in the run command
example so the container sees the external/defects4c/repos directory when run
from the defects4c directory.
- Around line 29-34: The Dockerfile currently runs as root; create and switch to
a non-root user to reduce blast radius: add a user/group (e.g., defects4c) and
set a fixed UID/GID, chown the DEFECTS4C_WORK_DIR and /app to that user, set
HOME if needed, and add USER defects4c before the final CMD; update references
to ENV DEFECTS4C_WORK_DIR, RUN mkdir -p "$DEFECTS4C_WORK_DIR", WORKDIR /app and
CMD ["bash"] so the container executes as the non-root user.
In `@external/defects4c/docs/methodology.md`:
- Line 15: Update the documentation text to reflect the actual catalog schema:
replace the incorrect reference to `build_system` with the correct `build_cmd`
(the tooling writes `build_cmd` and no `build_system`), and state that
per-project `build_cmd` entries appear in `bug_catalog.json` under
`projects_info` and per-bug records; keep the rest of the sentence about
supported build systems (Make, CMake, Autotools) intact.
In `@external/defects4c/evaluation/run_tests.py`:
- Around line 83-92: The script runs catalog-provided commands unsafely
(subprocess.run(build_cmd/test_cmd, shell=True) with no timeout) and duplicates
validation logic across files; change execution to require an explicit trust
flag (e.g., --trusted) before allowing shell=True, otherwise pass commands as
argument lists or run with shell=False and a safe argv list and always supply a
timeout (e.g., timeout=60); refactor duplicated functions load_catalog() and
find_bug() into a shared module (e.g., defects4c.catalog) and import them in
both run_tests.py and checkout_bug.py to remove the duplicated logic and
centralize path-traversal validation.
In `@external/defects4c/scripts/checkout_bug.py`:
- Around line 45-48: The is_valid_repo_url function currently accepts plain
"http://" which is insecure; update is_valid_repo_url to reject "http://" and
only accept "https://" and "git@" forms, and update the docstring to reflect the
new policy; specifically change the validation logic in is_valid_repo_url to
remove the u.startswith("http://") branch so only u.startswith("https://") or
u.startswith("git@") return True.
- Around line 102-111: The code currently takes the user/catalog value in
variable commit (from bug.get("fixed_commit") or bug.get("buggy_commit")) and
passes it directly to git checkout; add validation that commit matches a SHA
pattern before proceeding (e.g. regex r'^[0-9a-fA-F]{7,40}$' to allow 7..40 hex
chars or require 40 for full SHA1), and if it fails print a clear stderr message
and sys.exit(1). Apply this check immediately after commit is set in both places
where commit is assigned (the branches using args.fixed and the other occurrence
further down) so only validated SHAs are used for the git checkout operation.
In `@external/defects4c/scripts/setup.sh`:
- Around line 29-34: The current setup suppresses download errors by appending
"|| true" to the python3 "$DOWNLOAD_SCRIPT" invocation, which can leave a stale
bug_catalog.json while reporting success; remove the "|| true" and make the
script fail fast instead (either rely on existing errexit or explicitly check
the exit status of the python3 run and call exit 1 on failure) so failures in
running download_dataset.py (DOWNLOAD_SCRIPT) are propagated and cause the setup
to stop.
---
Nitpick comments:
In `@external/defects4c/evaluation/run_tests.py`:
- Around line 22-38: The functions load_catalog and find_bug are duplicated
elsewhere; extract them into a single shared utility module (e.g., create a new
module named defects4c_catalog_utils or similar) and move the implementations of
load_catalog and find_bug there, then replace the local definitions with imports
of load_catalog and find_bug where used (ensure callers still receive the same
signatures and behavior, including the FileNotFoundError and the short-form
matching logic). Update any import statements in modules that previously defined
or used these helpers to import from the new shared module and remove the
duplicated definitions to avoid drift.
In `@external/defects4c/scripts/download_dataset.py`:
- Around line 210-215: The per-project short-id assignment is nondeterministic
because it iterates catalog["bugs"] in remote fetch order; to stabilize it, sort
the bug list first and then assign versions: build a sorted sequence (e.g.,
sorted_bugs = sorted(catalog["bugs"], key=lambda b: (b.get("project",""),
b.get("id", b.get("bug_id", b.get("commit","")))))) and then iterate sorted_bugs
when updating version_per_project and setting b["version"]; keep the same
version_per_project logic but replace the unsorted loop over catalog["bugs"] so
PROJECT-1, PROJECT-2, ... are assigned deterministically.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5d21e499-f920-497f-8ccc-113d4ba128d1
📒 Files selected for processing (8)
external/defects4c/README.mdexternal/defects4c/docker/Dockerfileexternal/defects4c/docs/methodology.mdexternal/defects4c/evaluation/run_tests.pyexternal/defects4c/scripts/add_short_ids.pyexternal/defects4c/scripts/checkout_bug.pyexternal/defects4c/scripts/download_dataset.pyexternal/defects4c/scripts/setup.sh
…t, repo+SHA validation, setup fail-fast (Child Issue 2) fix(defects4c): Docker, docs, safety, and setup (Child Issue 2) - Dockerfile: mount $(pwd)/repos (not ../repos) when run from defects4c; add non-root user defects4c, chown /app and work dir, USER defects4c. - methodology.md: document build_cmd (not build_system) in projects_info and per-bug records. - scripts/catalog.py: add shared load_catalog and find_bug; checkout_bug and run_tests import from catalog to remove duplication. - run_tests.py: add --trusted flag; run build/test with shell=False and timeout=60 unless --trusted (then shell=True + timeout). - checkout_bug.py: is_valid_repo_url accepts only https:// and git@ (reject http://); validate commit SHA (7–40 hex) before git checkout. - setup.sh: remove "|| true" from download_dataset.py so setup fails fast on download failure.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@external/defects4c/evaluation/run_tests.py`:
- Around line 26-41: In _run_cmd, replace the brittle cmd.split() parsing with
shlex.split(cmd) to correctly handle quoted args and shell metacharacters, and
add explicit handling for subprocess.TimeoutExpired around both subprocess.run
calls so timeouts don't raise uncaught exceptions; on timeout write an
informative message to stderr (including label and timeout) and return a
non-zero exit code (e.g., 124). Also ensure shlex is imported at top if not
present. Use the existing trusted branch (subprocess.run(..., shell=True)) and
non-trusted branch (subprocess.run(argv,...)) as the locations to wrap with
try/except for TimeoutExpired.
In `@external/defects4c/scripts/checkout_bug.py`:
- Around line 109-117: The code currently validates only args.repo_url; instead,
after assigning repo_url = get_repo_url(catalog, project_raw) (and after
overriding with args.repo_url if provided), run is_valid_repo_url(repo_url) and
reject it if invalid; update the error message to state allowed forms are only
https:// or git@ (remove http://), call sys.exit(1) on failure, and ensure the
validation logic is applied whether repo_url came from get_repo_url(...) or
args.repo_url so a malformed catalog value cannot reach git clone.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8b10191c-946f-470f-a3aa-c44c3810f151
📒 Files selected for processing (6)
external/defects4c/docker/Dockerfileexternal/defects4c/docs/methodology.mdexternal/defects4c/evaluation/run_tests.pyexternal/defects4c/scripts/catalog.pyexternal/defects4c/scripts/checkout_bug.pyexternal/defects4c/scripts/setup.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- external/defects4c/docs/methodology.md
- external/defects4c/scripts/setup.sh
…url from catalog and args in checkout_bug fix(defects4c): safer command parsing and repo_url validation - run_tests.py: use shlex.split(cmd) for quoted args; catch subprocess.TimeoutExpired, log timeout to stderr, and return 124. - checkout_bug.py: validate repo_url from catalog or --repo-url with is_valid_repo_url; reject invalid URLs before git clone and state only https:// or git@ allowed.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
external/defects4c/evaluation/run_tests.py (1)
24-24: Make timeout configurable instead of fixed at 60 seconds.A hardcoded 60s timeout is likely too aggressive for Defects4C build/test workloads and can cause false negatives.
⏱️ Proposed refactor
-DEFAULT_TIMEOUT = 60 +DEFAULT_TIMEOUT = 1800 @@ - parser.add_argument( + parser.add_argument( "--trusted", action="store_true", help="Allow executing catalog build_cmd/test_cmd with shell=True; otherwise run with shell=False and a timeout.", ) + parser.add_argument( + "--timeout-sec", + type=int, + default=DEFAULT_TIMEOUT, + help="Timeout in seconds for each build/test command.", + ) @@ - rc = _run_cmd(build_cmd, project_dir, args.trusted, "Build") + rc = _run_cmd(build_cmd, project_dir, args.trusted, "Build", args.timeout_sec) @@ - rc = _run_cmd(test_cmd, project_dir, args.trusted, "Test") + rc = _run_cmd(test_cmd, project_dir, args.trusted, "Test", args.timeout_sec)Also applies to: 54-63, 98-104
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@external/defects4c/evaluation/run_tests.py` at line 24, Replace the hardcoded DEFAULT_TIMEOUT = 60 with a configurable value: read a timeout from an environment variable or CLI flag and fall back to 60 if not provided (e.g., parse TIMEOUT or --timeout into an integer), and have all uses of DEFAULT_TIMEOUT (and any references in the blocks around the previous occurrences at the other locations) consume this configured variable instead of the literal 60; update the name DEFAULT_TIMEOUT to reflect it’s configurable (e.g., DEFAULT_TIMEOUT or timeout_seconds) and validate/convert the input to an int with a sensible lower/upper bound before use.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@external/defects4c/evaluation/run_tests.py`:
- Around line 32-46: The _run_cmd implementation presently only catches
subprocess.TimeoutExpired, so FileNotFoundError/OSError from subprocess.run
(when shell=False) will crash; wrap both subprocess.run calls (the trusted
shell=True branch and the argv branch using shlex.split) in exception handling
that catches FileNotFoundError and OSError, print a concise error to stderr
including the label and the exception message (e.g., f"{label} failed to start:
{e}"), and return a non-zero exit code (use 127 for command-not-found/startup
failures) instead of letting the exception propagate; reference
variables/functions to change: _run_cmd, subprocess.run, trusted, argv, label,
DEFAULT_TIMEOUT, cwd.
In `@external/defects4c/scripts/checkout_bug.py`:
- Around line 58-60: The code unconditionally calls load_catalog(data_dir) which
throws if bug_catalog.json is missing and prevents the direct PROJECT@SHA
fallback; update checkout_bug.py so load_catalog is called defensively (either
check for the catalog file's existence or catch exceptions from load_catalog)
and only attempt find_bug(catalog, args.bug_id) when a valid catalog was loaded;
if no catalog is available, skip the find_bug path and proceed to the existing
direct PROJECT@SHA parsing logic (the code around load_catalog, find_bug, and
args.bug_id) so the advertised direct mode works when bug_catalog.json is
absent.
- Around line 121-126: The git subprocess invocations (the direct subprocess.run
calls for clone, fetch, and checkout in checkout_bug.py) lack timeouts and
user-friendly error handling; create a helper function (e.g., run_git(cmd_args,
cwd, timeout=300)) that calls subprocess.run([...], check=True, cwd=cwd,
timeout=timeout) and wraps it in try/except for subprocess.TimeoutExpired and
subprocess.CalledProcessError to print/ log clear CLI messages and exit with a
non-zero code; replace the three raw subprocess.run calls for "git clone", "git
fetch origin", and "git checkout" with calls to run_git so stalled or failing
git operations are time-bounded and surfaced gracefully.
---
Nitpick comments:
In `@external/defects4c/evaluation/run_tests.py`:
- Line 24: Replace the hardcoded DEFAULT_TIMEOUT = 60 with a configurable value:
read a timeout from an environment variable or CLI flag and fall back to 60 if
not provided (e.g., parse TIMEOUT or --timeout into an integer), and have all
uses of DEFAULT_TIMEOUT (and any references in the blocks around the previous
occurrences at the other locations) consume this configured variable instead of
the literal 60; update the name DEFAULT_TIMEOUT to reflect it’s configurable
(e.g., DEFAULT_TIMEOUT or timeout_seconds) and validate/convert the input to an
int with a sensible lower/upper bound before use.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e9785125-0d8a-4c3b-89c7-6f81dfb69a8e
📒 Files selected for processing (2)
external/defects4c/evaluation/run_tests.pyexternal/defects4c/scripts/checkout_bug.py
…g, run_git (Child Issue 2)
- run_tests.py: Docstring default is "make check" only (remove ctest).
- run_tests.py: In _run_cmd, catch FileNotFoundError/OSError; print
"{label} failed to start: {e}", return 127.
- checkout_bug.py: Load catalog in try/except (FileNotFoundError,
OSError, JSONDecodeError); if load fails set catalog={}, bug=None so
direct PROJECT@SHA mode works when bug_catalog.json is absent.
- checkout_bug.py: Add run_git(cmd_args, cwd, timeout=300) with
TimeoutExpired/CalledProcessError handling; use it for clone, fetch,
checkout instead of raw subprocess.run.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@external/defects4c/evaluation/run_tests.py`:
- Around line 68-74: Wrap the call to load_catalog/find_bug in a try/except so
catalog load errors don't abort direct-mode fallback: call
load_catalog(data_dir) inside a try block, on exception (or when load_catalog
returns None) fall back to the existing direct PROJECT@SHA logic that constructs
bug = {"bug_id": args.bug_id, "project": project, "test_cmd":"make check",
"build_cmd": ""}; ensure this handling is applied before the final if not bug
exit path so the fallback runs even if load_catalog raises. Use the existing
symbols load_catalog, find_bug, args.bug_id and the created bug dict to
implement the fix.
- Line 97: The fallback for test_cmd currently applies .strip() to
bug.get("test_cmd") and treats a whitespace-only string as a non-default value;
update the logic in run_tests.py so you first retrieve the raw value (e.g., raw
= bug.get("test_cmd")), then if raw is None or raw.strip() == "" set test_cmd to
the default "make check", otherwise set test_cmd to raw.strip(); locate the
assignment to test_cmd in the function or scope where test commands are prepared
and replace it with this conditional selection.
In `@external/defects4c/scripts/checkout_bug.py`:
- Around line 43-52: The run_git function currently only catches TimeoutExpired
and CalledProcessError but not process startup failures; update run_git to also
catch FileNotFoundError (and optionally OSError) around the subprocess.run call
and handle it by printing a clear error to stderr (e.g., "git ...: not found or
not executable: {e}") and calling sys.exit(1); reference the run_git function
and the subprocess.run invocation so you add the new except FileNotFoundError
(and/or except OSError as a fallback) branch mirroring the existing error
handling behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9a3b0266-978f-4bfb-a574-37504a8f3f88
📒 Files selected for processing (2)
external/defects4c/evaluation/run_tests.pyexternal/defects4c/scripts/checkout_bug.py
…y/except fallback
fix(defects4c): test_cmd fallback, run_git exceptions, catalog load fallback
- run_tests: treat None or whitespace-only test_cmd as missing; use
raw = bug.get("test_cmd") and default to "make check" when
raw is None or str(raw).strip() == ""
- checkout_bug: in run_git, catch FileNotFoundError and OSError around
subprocess.run; print clear error to stderr and sys.exit(1)
- run_tests: wrap load_catalog/find_bug in try/except so catalog load
errors (FileNotFoundError, OSError, JSONDecodeError) set bug = None
and direct PROJECT@SHA fallback still runs
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@external/defects4c/evaluation/run_tests.py`:
- Around line 68-79: When load_catalog(args.data_dir) or find_bug(...) throws
(caught by the except block), preserve the exception object (e.g., save it as
load_err) instead of discarding it; then, after the PROJECT@SHA fallback logic
that builds a synthetic bug for IDs containing "@", if no bug was found and the
original failure was a catalog load error (load_err is set) emit that error to
stderr (include load_err details and a clear message) and exit, otherwise
continue with the existing "Bug not found" behavior. Update the try/except
around load_catalog/find_bug and the subsequent if not bug checks to reference
load_err and args.bug_id so the real catalog load error is surfaced when
applicable.
- Around line 99-104: The code calls .strip() on build_cmd without ensuring it's
a string, which crashes for non-string values; mirror the existing pattern used
for test_cmd: retrieve raw_build = bug.get("build_cmd"), then set build_cmd =
"make <...>"? (keep current behavior) — specifically, replace the current
build_cmd = (bug.get("build_cmd") or "").strip() with logic like raw_build =
bug.get("build_cmd"); if raw_build is None or str(raw_build).strip() == "":
build_cmd = "" (or the intended default); else build_cmd =
str(raw_build).strip(); update references to build_cmd accordingly so non-string
catalog values are coerced before calling .strip().
In `@external/defects4c/scripts/checkout_bug.py`:
- Around line 98-106: Ensure project_raw is a non-empty string before calling
sanitize_project: check that bug.get("project") returns an instance of str and
that project_raw.strip() is not empty, otherwise print a clear error to stderr
and sys.exit(1); then call sanitize_project(project_raw) inside the existing
try/except for ValueError to handle sanitization errors. Use the existing
variable names project_raw, sanitize_project, and project to locate and update
the logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2d5bcb99-ee97-44de-a4a9-b3f8ef070088
📒 Files selected for processing (2)
external/defects4c/evaluation/run_tests.pyexternal/defects4c/scripts/checkout_bug.py
…alidation (Child Issue 2)
closes #11
Summary by CodeRabbit
New Features
Documentation