Skip to content

fix: use safe deserialization and tarfile extraction#70

Open
gn00295120 wants to merge 2 commits intoNVIDIA:mainfrom
gn00295120:fix/unsafe-deserialization-and-tarfile
Open

fix: use safe deserialization and tarfile extraction#70
gn00295120 wants to merge 2 commits intoNVIDIA:mainfrom
gn00295120:fix/unsafe-deserialization-and-tarfile

Conversation

@gn00295120
Copy link
Copy Markdown

@gn00295120 gn00295120 commented Mar 22, 2026

Summary

  • Add weights_only=True to all torch.load() calls to prevent arbitrary code execution via pickle deserialization (CVE-2025-32434, CWE-502)
  • Add filter='data' to all tarfile.extractall() calls to prevent path traversal attacks (CVE-2007-4559, CWE-22)

Files Changed

File Fix
moshi/moshi/models/lm.py line ~979 torch.load(..., weights_only=True)
moshi/moshi/models/loaders.py lines ~160, ~217, ~295 torch.load(..., weights_only=True)
moshi/moshi/server.py lines ~333, ~349 tarfile.extractall(..., filter='data')
moshi/moshi/offline.py line ~145 tarfile.extractall(..., filter='data')

Security Impact

torch.load() without weights_only=True deserializes arbitrary Python objects via pickle, allowing remote code execution if a malicious checkpoint file is loaded. The weights_only=True flag restricts deserialization to tensor data only.

tarfile.extractall() without filter='data' can be exploited via crafted archives containing ../ path components (tar slip / directory traversal). The filter='data' argument (available since Python 3.12, backported to 3.11.4+) strips dangerous metadata and blocks absolute/traversal paths.

Test Plan

  • Verify model loading still works with weights_only=True on standard .pt checkpoints
  • Verify tarfile extraction of voices.tgz and dist.tgz succeeds with filter='data'
  • Confirm no regressions in existing test suite

Add weights_only=True to torch.load() calls to prevent arbitrary code
execution via pickle (CVE-2025-32434). Add filter='data' to
tarfile.extractall() to prevent path traversal (CVE-2007-4559).

Addresses CWE-502, CWE-22.
Copilot AI review requested due to automatic review settings March 22, 2026 00:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens model loading and archive extraction paths to reduce the risk of unsafe deserialization (PyTorch checkpoints) and tar path traversal during asset extraction.

Changes:

  • Add weights_only=True to torch.load() calls across model-loading codepaths.
  • Add filter='data' to tarfile.extractall() calls when extracting voices.tgz and dist.tgz.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
moshi/moshi/server.py Uses tarfile.extractall(..., filter='data') when extracting downloaded archives.
moshi/moshi/offline.py Uses tarfile.extractall(..., filter='data') when extracting voice prompt archives.
moshi/moshi/models/loaders.py Loads PyTorch checkpoints with weights_only=True for safer deserialization.
moshi/moshi/models/lm.py Loads saved voice prompt embeddings with weights_only=True.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

logger.info(f"extracting {voices_tgz} to {voices_dir}")
with tarfile.open(voices_tgz, "r:gz") as tar:
tar.extractall(path=voices_tgz.parent)
tar.extractall(path=voices_tgz.parent, filter='data')
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tarfile.TarFile.extractall() only supports the filter kwarg on Python 3.12+; this repo declares requires-python = ">= 3.10", so on 3.10/3.11 this will raise TypeError: extractall() got an unexpected keyword argument 'filter' at runtime. Consider either (a) raising the minimum supported Python to 3.12 (and documenting it), or (b) feature-detecting support for filter (e.g., via inspect.signature) and falling back to a manual “safe extract” implementation that rejects absolute paths, .. traversal, and symlinks/hardlinks.

Copilot uses AI. Check for mistakes.
if not dist.exists():
with tarfile.open(dist_tgz, "r:gz") as tar:
tar.extractall(path=dist_tgz.parent)
tar.extractall(path=dist_tgz.parent, filter='data')
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same compatibility issue as above: passing filter='data' to extractall() will crash on Python < 3.12, but moshi/pyproject.toml currently allows Python 3.10+. Please add a version/feature-detected fallback safe extraction path (or bump the minimum Python version accordingly).

Copilot uses AI. Check for mistakes.
log("info", f"extracting {voices_tgz} to {voices_dir}")
with tarfile.open(voices_tgz, "r:gz") as tar:
tar.extractall(path=voices_tgz.parent)
tar.extractall(path=voices_tgz.parent, filter='data')
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tarfile.extractall(..., filter='data') is not supported on Python 3.10/3.11 and will raise TypeError at runtime. Since the project currently supports Python >=3.10, please add a compatibility fallback (manual safe extraction) or bump the minimum supported Python version.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants