Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
9dd3619
gh-117829 : start of something new
VietThan Jun 15, 2024
b887165
gh-117829 : logic implementation
VietThan Jun 15, 2024
23785b2
gh-117829 : remove whitespace pre-commit was complaining about
VietThan Jun 15, 2024
3286657
gh-117829: testing out manually exclude-pattern
VietThan Jun 25, 2024
c28771f
gh-117829: add example to --exclude-pattern
VietThan Jun 26, 2024
935a1a5
📜🤖 Added by blurb_it.
blurb-it[bot] Jun 26, 2024
60a9aef
gh-117829: added News with blurb_it
VietThan Jun 26, 2024
ee2b185
gh-117829 : adding to documentation
VietThan Jun 26, 2024
aaed611
gh-117829: failed test on Docs
VietThan Jun 26, 2024
1abe95c
Merge branch 'main' into implement-issue-117829
VietThan Jun 26, 2024
ac29c0e
gh-117829: failed tests on Docs
VietThan Jun 27, 2024
6d0ca9a
Merge remote-tracking branch 'refs/remotes/origin/implement-issue-117…
VietThan Jun 27, 2024
408dd71
gh-117829: linting complains
VietThan Jun 27, 2024
fdcd64f
gh-117829: linting complains
VietThan Jun 27, 2024
da75715
gh-117829 : gotta be careful with rst formatting
VietThan Jun 27, 2024
c4b7cd1
Merge branch 'main' into implement-issue-117829
VietThan Jun 27, 2024
ab215b7
gh-117829 : adding tests for zipapp module
VietThan Aug 17, 2024
a85989c
Merge remote-tracking branch 'refs/remotes/origin/implement-issue-117…
VietThan Aug 17, 2024
e2e6b35
gh-117829 : stop lint problems
VietThan Aug 17, 2024
62dc347
gh-117829 : adding cmdline tests
VietThan Aug 17, 2024
bff0a07
gh-117829 : prevent linting
VietThan Aug 17, 2024
a2bc0bf
Merge branch 'main' into implement-issue-117829
VietThan Aug 17, 2024
36ac9b0
gh-117829 : fix the refex
VietThan Aug 23, 2024
0f077eb
Merge remote-tracking branch 'refs/remotes/origin/implement-issue-117…
VietThan Aug 23, 2024
d5ffee3
Merge remote-tracking branch 'upstream/main' into implement-issue-117829
VietThan Sep 27, 2025
59e2089
gh-117829 : addressed comments, purely CLI flags of glob patterns pas…
VietThan Sep 27, 2025
2e17bbf
gh-117829 : added tests
VietThan Sep 27, 2025
9ec1cf5
gh-117829 : ran pre-commit
VietThan Sep 27, 2025
28c425d
gh-117829 : simplify logic, patterns are standard glob patterns, as i…
VietThan Sep 27, 2025
3265a5a
gh-117829 : updated docs based on feedback
VietThan Sep 27, 2025
c67b6b1
gh-117829 : updated docs for options based on feedback
VietThan Sep 27, 2025
617cf18
gh-117829 : updated docs for options based on feedback
VietThan Sep 28, 2025
e08e765
gh-117829 : small changes based on feedback
VietThan Sep 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions Doc/library/zipapp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,24 @@ The following options are understood:
this case, any other options are ignored and SOURCE must be an archive, not a
directory.

.. option:: --include pattern

Include only files and directories that match the given glob pattern(s).
Patterns use standard globbing as implemented by :class:`pathlib.PurePath.match`.

If this option is not specified, all files in the given directory are included by
default (subject to any :option:`--exclude` patterns).

.. option:: --exclude pattern

Exclude files and directories that match the given glob pattern(s).
Patterns use standard globbing as implemented by :class:`pathlib.PurePath.match`.

If both :option:`--include` and :option:`--exclude` are specified, the set of
files to be included is picked first. Then any to be excluded are removed from
that set. The order of the options does not affect how they are processed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the second sentence (explaining priorities) from these two descriptions, because it's not very clear. Instead, have a separate paragraph that explains how the options interact. Something like this:

If both --include and --exclude are specified, the files to be included are picked first. Then from that list of files, any to be excluded are removed. The order of the options does not affect how they are processed.

Also, add a note against --include saying "If this option is not specified, all files in the given directory are included by default."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made that "second paragraph" as a note. Please let me know what other things need to be changed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these need to be notes, they are too visually imposing like that. Please make them ordinary text.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the change, please let me know if I should change something else


.. option:: -h, --help

Print a short usage message and exit.
Expand Down Expand Up @@ -229,6 +247,53 @@ fits in memory::
>>> with open('myapp.pyz', 'wb') as f:
>>> f.write(temp.getvalue())

To filter which files go into the archive, use :option:`--include` or
:option:`--exclude` with standard glob patterns (as implemented by
:class:`pathlib.PurePath.match`).

Including only specific files:

.. code-block:: shell-session

$ ls myapp
__main__.py helper.py data.txt

# Keep only Python sources; anything not matched is implicitly excluded
$ python -m zipapp myapp -o myapp.pyz --include "*.py"
$ unzip myapp.pyz -d extracted_myapp
Archive: myapp.pyz
extracting: extracted_myapp/__main__.py
extracting: extracted_myapp/helper.py

Excluding a subtree or file type:

.. code-block:: shell-session

$ ls -R myapp
myapp:
__main__.py helper.py tests/ build/

myapp/tests:
test_helper.py

myapp/build:
scratch.txt

# Add everything except the tests/ directory items and *.pyc files
$ python -m zipapp myapp -o myapp.pyz --exclude "tests/**" --exclude "*.pyc"
$ unzip myapp.pyz -d extracted_myapp
Archive: myapp.pyz
extracting: extracted_myapp/__main__.py
creating: extracted_myapp/build/
extracting: extracted_myapp/build/scratch.txt
extracting: extracted_myapp/helper.py
creating: extracted_myapp/tests/

.. note::

* Patterns follow :class:`pathlib.PurePath.match`. To match all of a
directory contents, use ``dir/**``.


.. _zipapp-specifying-the-interpreter:

Expand Down
98 changes: 98 additions & 0 deletions Lib/test/test_zipapp.py
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,12 @@ def make_archive(self):
zipapp.create_archive(source, target)
return target

def _make_tree(self, root: pathlib.Path, files: list[str]) -> None:
for rel in files:
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.touch()

def test_cmdline_create(self):
# Test the basic command line API.
source = self.tmpdir / 'source'
Expand Down Expand Up @@ -454,6 +460,98 @@ def test_info_error(self):
# Program should exit with a non-zero return code.
self.assertTrue(cm.exception.code)

def test_cmdline_include_then_exclude(self):
source = self.tmpdir / 'source'
source.mkdir()
self._make_tree(source, [
'__main__.py',
'foo/a.py',
'foo/b.pyc',
'bar/c.txt',
])

# Include 'foo' (directory implies subtree), then exclude *.pyc
args = [
str(source),
'--include', '*.py',
'--include', 'foo',
'--exclude', '**/*.pyc']
zipapp.main(args)

target = source.with_suffix('.pyz')
with zipfile.ZipFile(target, 'r') as z:
names = set(z.namelist())
# Always contains __main__.py unless overridden by -m
self.assertIn('__main__.py', names)
self.assertIn('foo/', names)
self.assertIn('foo/a.py', names)
# Excluded by pattern
self.assertNotIn('foo/b.pyc', names)
# Not included at all since include restricted to 'foo'
self.assertNotIn('bar/', names)
self.assertNotIn('bar/c.txt', names)

def test_cmdline_multiple_includes_commas_and_extend(self):
source = self.tmpdir / 'src'
source.mkdir()
self._make_tree(source, [
'__main__.py',
'pkg/x.py',
'pkg/y.txt',
'data/readme.txt',
'data/keep.bin',
])

args = [
str(source),
'--include', 'pkg/**',
'--include', 'data/*.txt',
'--include', 'data/keep.bin',
]
zipapp.main(args)

target = source.with_suffix('.pyz')
with zipfile.ZipFile(target, 'r') as z:
names = set(z.namelist())
# did not include root files
self.assertNotIn('__main__.py', names)
# from "pkg"
self.assertIn('pkg/x.py', names)
self.assertIn('pkg/y.txt', names)
# from "data/*.txt"
self.assertIn('data/readme.txt', names)
# from the second --include
self.assertIn('data/keep.bin', names)

def test_cmdline_exclude_directory_over_included_files(self):
source = self.tmpdir / 'tree'
source.mkdir()
self._make_tree(source, [
'__main__.py',
'foo/a.py',
'foo/b.py',
'bar/c.py',
])

# Include all *.py, but exclude 'foo/**' entirely
args = [
str(source),
'--include', '*.py',
'--exclude', 'foo/**',
]
zipapp.main(args)

target = source.with_suffix('.pyz')
with zipfile.ZipFile(target, 'r') as z:
names = set(z.namelist())
self.assertIn('__main__.py', names)
# foo is excluded even though files match *.py
self.assertNotIn('foo/', names)
self.assertNotIn('foo/a.py', names)
self.assertNotIn('foo/b.py', names)
# bar/c.py remains
self.assertIn('bar/c.py', names)


if __name__ == "__main__":
unittest.main()
63 changes: 62 additions & 1 deletion Lib/zipapp.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
import sys
import zipfile

from collections.abc import Iterable, Callable

__all__ = ['ZipAppError', 'create_archive', 'get_interpreter']


Expand Down Expand Up @@ -177,6 +179,46 @@ def get_interpreter(archive):
if f.read(2) == b'#!':
return f.readline().strip().decode(shebang_encoding)

def _make_glob_filter(
includes: Iterable[str] | None,
excludes: Iterable[str] | None,
) -> Callable[[pathlib.Path], bool]:
"""
Build a filter(relative_path: Path) -> bool applying includes first, then excludes.

Semantics:
- Patterns are standard glob patterns as implemented by PurePath.match.
- If 'includes' is empty, all files/dirs are initially eligible.
- If any exclude pattern matches, the path is rejected.
"""

def _normalize_patterns(values: Iterable[str] | None) -> list[str]:
"""
Return patterns exactly as provided by the CLI (no comma splitting).
Each item is stripped of surrounding whitespace; empty items are dropped.
"""
if not values:
return []
out: list[str] = []
for v in values:
v = v.strip()
if v:
out.append(v)
return out

inc = _normalize_patterns(values=includes)
exc = _normalize_patterns(values=excludes)

def _filter(rel: pathlib.Path) -> bool:
# If includes were provided, at least one must match.
if inc and not any(rel.match(pat) for pat in inc):
return False
# Any exclude match removes the path.
if exc and any(rel.match(pat) for pat in exc):
return False
return True

return _filter
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all feels way too complicated. We should just use the pathlib glob matching function, pth.match(). Something like:

  • If there are no --include or --exclude options, the filter is none.
  • Otherwise, filter(pth) should check whether pth.match(inc) for each include pattern. If none of them match, return False. Then, check pth.match(exc) against each exclude filter. If any of them match, return False. Otherwise, return True.

The documentation can simply state that patterns are standard glob patterns, as implemented by PurePath.match.

In particular, matches should not assume Posix path separators or case sensitivity. The pathlib functions handle this automatically, your current code doesn't.

And yes, this does mean that to match a directory and all of its contents, you need to use --include foo/**. I'm fine with that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the thorough review. I was also hesitant at first about whether I wanted to implement all this complexities but I figured let's see if it's possible and let you/other reviewers see if it's appropriate. I've made the changes


def main(args=None):
"""Run the zipapp command line interface.
Expand Down Expand Up @@ -204,6 +246,15 @@ def main(args=None):
help="Display the interpreter from the archive.")
parser.add_argument('source',
help="Source directory (or existing archive).")
parser.add_argument('--include', action='extend', nargs='+', default=None,
help=("Glob pattern(s) of files/dirs to include (relative to SOURCE). "
"Repeat the flag for multiple patterns. "
"To include a directory and its contents, use 'foo/**'."))
parser.add_argument('--exclude', action='extend', nargs='+', default=None,
help=("Glob pattern(s) of files/dirs to exclude (relative to SOURCE). "
"Repeat the flag for multiple patterns. "
"To exclude a directory and its contents, use 'foo/**'. "
"Applied after --include."))

args = parser.parse_args(args)

Expand All @@ -222,9 +273,19 @@ def main(args=None):
if args.main:
raise SystemExit("Cannot change the main function when copying")

# build a filter from include and exclude flags
filter_fn = None
src_path = pathlib.Path(args.source)
if src_path.exists() and src_path.is_dir() and (args.include or args.exclude):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we only supporting --include and --exclude for directory sources? I don't see why they wouldn't be useful for archives as well. What's the motivation for this limitation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, if a file source is given, we do a byte-copy operation other than modifying the shebang (_copy_archive), so my implementation applied --include/--exclude only when we’re creating a new archive from a directory as I wasn't sure if applying filtering to an archive source is expected/unexpected. If we want those filters to work with an existing archive, that mode would no longer be a pure copy as we’d need to unzip, filter, then re-zip the contents.

That being said, I'm happy to implement a "repack with filtering" functionality. Or I could have the CLI error when an archive source has filters applied.

if os.path.isfile(args.source) and (args.include or args.exclude):
    raise SystemExit("--include/--exclude only apply when SOURCE is a directory")

Please let me know what you prefer and I'll update the PR accordingly @pfmoore .

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good instinct here. This is a problem that was introduced when we added the filter argument to create_archive, but which was not picked up then. IMO it's a bad UI for the filter argument to do nothing when copying an archive - but there's no practical use case that I'm aware of for filtering an existing archive.

I don't want to add extra complexity (that needs to be maintained, tested and supported) when no-one will actually use it, but equally I don't think we should have a UI that suggests an "obvious" usage (zipapp foo.pyz -o new_foo.pyz --exclude secret_data.txt) but doesn't actually implement it.

Honestly, at this point I'm wishing I'd stuck to my original principles and rejected the addition of the filter argument in the first place 🙁

Let's just raise an error when the source is an existing zipapp and --include/--exclude are specified. But please also add a similar error in the create_archive function. Also add a note to the command line help, and to the documentation of the create_archive function, explaining that when the source is an existing zipapp, the only supported functionality is to modify the shebang line (via the -p/--python command line argument, and the interpreter argument of the function, respectively).

filter_fn = _make_glob_filter(
includes=args.include,
excludes=args.exclude
)

create_archive(args.source, args.output,
interpreter=args.python, main=args.main,
compressed=args.compress)
compressed=args.compress,
filter=filter_fn)


if __name__ == '__main__':
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add flags ``--include`` and ``--exclude`` to the CLI of the :mod:`zipapp` module. These flags accept glob patterns to
indicate allow-list and/or deny-list of files to be included in the zipapp file.
Loading