Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
117 commits
Select commit Hold shift + click to select a range
1bd92e8
Nearly-working impl
janbridley Dec 20, 2024
da87f8c
Full working example
janbridley Dec 20, 2024
b28cc05
Clean up layout
janbridley Dec 20, 2024
2ae2d31
Further cleanup
janbridley Dec 20, 2024
a05d8b8
Lint OO
janbridley Dec 20, 2024
5bc5299
Run pre-commit on _errors.py
janbridley Dec 21, 2024
06fabc7
Add oo.py temp implementation
janbridley Dec 21, 2024
727f604
Undo changes to sample data
janbridley Dec 21, 2024
2752632
Lint oo.py
janbridley Dec 21, 2024
b643d5f
Remove change to sample data
janbridley Dec 21, 2024
c949095
Add oo to init.py
janbridley Dec 21, 2024
04280cc
Handle edge cases
janbridley Dec 21, 2024
0d53a30
Test parsing real files
janbridley Dec 21, 2024
e7855f1
Improve robustness of table reader
janbridley Dec 21, 2024
f7a9d75
Lint oo and conftest
janbridley Dec 21, 2024
9c63222
Clean up text and remove comments
janbridley Dec 21, 2024
aa78a68
Port initial test to new style
janbridley Dec 21, 2024
7d07b59
Port remaining key tests
janbridley Dec 21, 2024
56a4d50
Minor fixes
janbridley Dec 21, 2024
16f1655
Clean up test_key_reader.py
janbridley Dec 21, 2024
5e92141
Progress toward transition to recarray
janbridley Dec 21, 2024
a44a925
Increase tests and fix memory layout bug
janbridley Dec 22, 2024
b020262
Fixes to memory layout
janbridley Dec 22, 2024
ccb2aea
Convert table_reader tests
janbridley Dec 22, 2024
1e81654
Linting and doc fixes
janbridley Dec 22, 2024
97e0cbd
Clean up docs
janbridley Dec 22, 2024
33134e8
Finish porting tests
janbridley Dec 23, 2024
7bc9f86
Lints
janbridley Dec 23, 2024
c80a86a
Fix for scalar array inputs
janbridley Dec 24, 2024
e23762e
Remove unnecessary filterwarning
janbridley Dec 24, 2024
917685f
Expand on tests
janbridley Dec 24, 2024
933d4b9
Clean up unitcells
janbridley Dec 24, 2024
42151fa
Lint tests
janbridley Dec 24, 2024
8a5e73f
Finalize lints
janbridley Dec 24, 2024
12b73c5
Restructure patterns
janbridley Dec 24, 2024
8f945fd
Update test_patterns
janbridley Dec 24, 2024
0d50bc8
Lint and clean up
janbridley Dec 24, 2024
d39ce71
Final lint
janbridley Dec 24, 2024
4bfe1b3
Improve a few tests
janbridley Dec 24, 2024
cded448
Add symops to example cif
janbridley Dec 24, 2024
faf9816
Remove package-unitcells deprecated docs
janbridley Dec 24, 2024
efb80e8
Fix link in package-parse
janbridley Dec 24, 2024
86faca3
Update quickstart tutorial
janbridley Dec 24, 2024
867a37d
Move oo.py to parsnip.py
janbridley Dec 24, 2024
31a5b6c
Update README
janbridley Dec 24, 2024
9ceb2f9
Update Unitcells test imports
janbridley Dec 24, 2024
8ef7150
Lint
janbridley Dec 24, 2024
2dba4be
Lazily load file
janbridley Dec 24, 2024
65d6890
Remove unused files
janbridley Dec 24, 2024
8b6f99f
Skip bad_cif test
janbridley Dec 25, 2024
05895ea
Clean up tests
janbridley Dec 25, 2024
305a37a
Lint
janbridley Dec 25, 2024
3592f1e
Add tests for table_labels and cast_numerics
janbridley Dec 26, 2024
0cc212e
Clean up tests
janbridley Dec 26, 2024
79a1c85
Lint
janbridley Dec 26, 2024
a16c7e8
Clean up docstrings
janbridley Dec 26, 2024
716cc4c
Lint and update docstrings
janbridley Dec 27, 2024
f211007
Further docs
janbridley Dec 27, 2024
def08df
Codespell
janbridley Dec 27, 2024
a001b48
Tests for cell
janbridley Dec 27, 2024
a40fa7e
Lint
janbridley Dec 27, 2024
cf83de1
Update errors for read_unit_cell
janbridley Dec 27, 2024
ab12074
Clean up tests and todos
janbridley Dec 28, 2024
46d5577
More TODOs
janbridley Dec 28, 2024
b1e4388
Lint
janbridley Dec 28, 2024
e27c00b
Add test for cell property
janbridley Dec 28, 2024
4ced6ed
Remove modindex from sidebar
janbridley Dec 28, 2024
b8eb804
Consolidate logic for nonsimple data
janbridley Dec 28, 2024
7369f84
Lint
janbridley Dec 28, 2024
8b8a0b5
Fix type annotation in cast_array function
janbridley Dec 28, 2024
d6b4da4
Add more-itertools as official dependency
janbridley Dec 28, 2024
343150d
Clean up dependency documentation
janbridley Dec 28, 2024
30d7776
Add index for ase backward compatibility
janbridley Dec 28, 2024
e46fdf4
Change wording in development.rst
janbridley Dec 28, 2024
1871e50
Replace index specification
janbridley Dec 28, 2024
66701ff
Disable ASE test on python3.7
janbridley Dec 28, 2024
67e6b22
Fix version check
janbridley Dec 28, 2024
31eb83d
Add additional lints
janbridley Dec 28, 2024
31a9907
Document additional rules in pyproject.toml
janbridley Dec 28, 2024
2828a3b
Move PATTERNS dict to end of docs
janbridley Dec 28, 2024
54f316c
Clean up development.rst
janbridley Dec 28, 2024
dadb0b4
Expand with tests from additional databases
janbridley Dec 28, 2024
8672046
Disable lint that causes warning
janbridley Dec 28, 2024
08b35c1
Fix for multiline data entries
janbridley Dec 29, 2024
10a60fa
Progress toward multiline string parsing
janbridley Dec 29, 2024
f7402d4
Working impl that fails for blocks containing a semicolon
janbridley Dec 29, 2024
58339c5
Clean up
janbridley Dec 29, 2024
993f698
Messy working impl
janbridley Dec 29, 2024
88c3a1a
Clean up
janbridley Dec 29, 2024
602edf9
Retain newlines
janbridley Dec 29, 2024
0230b52
Lint
janbridley Dec 29, 2024
82f31c6
Add TODO
janbridley Dec 29, 2024
f029663
Add missing multiline keys
janbridley Dec 29, 2024
e648ce4
Wrap accumulator into a function
janbridley Dec 29, 2024
d643fad
Clean up _accumulate_nonsimple_data
janbridley Dec 29, 2024
85c39be
Clean up unused comments
janbridley Dec 29, 2024
a7a2468
Update changelog.rst
janbridley Dec 30, 2024
67b59d4
Fix version headings in changelog
janbridley Dec 30, 2024
0c14506
Update README to reflect correct CIF2.0 status
janbridley Dec 31, 2024
af0325d
Add CIFTEST data to gitignore
janbridley Dec 31, 2024
19a8173
Escape dash in regex and allow forward slash in data name
janbridley Dec 31, 2024
2fec769
Swap namedtuple to dataclass and clean up provided keys
janbridley Dec 31, 2024
5eb4bbf
Auto detect cif keys
janbridley Dec 31, 2024
6f30930
Allow pdb matrix keys
janbridley Dec 31, 2024
0ff9e9a
Generalize nonsimple data delimiters
janbridley Dec 31, 2024
bb143d3
Add architecture.md
janbridley Dec 31, 2024
c383131
Update table tests and fix regex for nonsimple data in tabs
janbridley Dec 31, 2024
e84e94a
Add pycifrw to test reqs
janbridley Dec 31, 2024
327d3f1
Lint tests
janbridley Dec 31, 2024
65a0015
Verify all table content
janbridley Jan 1, 2025
fdf1e21
Lint
janbridley Jan 1, 2025
f6051c9
import annotations
janbridley Jan 1, 2025
8b8df48
Remove unused pattern
janbridley Jan 1, 2025
d031882
Rename tables to loops
janbridley Jan 1, 2025
55e8817
Remove extra character from regex
janbridley Jan 1, 2025
f6517b0
Clean up table reader
janbridley Jan 1, 2025
357615a
Clean up
janbridley Jan 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions .github/requirements-3.10.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This file was autogenerated by uv via the following command:
# uv pip compile --python-version=3.10 pyproject.toml tests/requirements.in
ase==3.23.0
ase==3.24.0
# via -r tests/requirements.in
contourpy==1.3.1
# via matplotlib
Expand All @@ -14,16 +14,19 @@ gemmi==0.7.0
# via -r tests/requirements.in
iniconfig==2.0.0
# via pytest
kiwisolver==1.4.7
kiwisolver==1.4.8
# via matplotlib
matplotlib==3.10.0
# via ase
numpy==2.2.0
more-itertools==10.5.0
# via parsnip (pyproject.toml)
numpy==2.2.1
# via
# parsnip (pyproject.toml)
# ase
# contourpy
# matplotlib
# pycifrw
# scipy
packaging==24.2
# via
Expand All @@ -33,7 +36,11 @@ pillow==11.0.0
# via matplotlib
pluggy==1.5.0
# via pytest
pyparsing==3.2.0
ply==3.11
# via pycifrw
pycifrw==4.4.6
# via -r tests/requirements.in
pyparsing==3.2.1
# via matplotlib
pytest==8.3.4
# via -r tests/requirements.in
Expand Down
15 changes: 11 additions & 4 deletions .github/requirements-3.11.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This file was autogenerated by uv via the following command:
# uv pip compile --python-version=3.11 pyproject.toml tests/requirements.in
ase==3.23.0
ase==3.24.0
# via -r tests/requirements.in
contourpy==1.3.1
# via matplotlib
Expand All @@ -12,16 +12,19 @@ gemmi==0.7.0
# via -r tests/requirements.in
iniconfig==2.0.0
# via pytest
kiwisolver==1.4.7
kiwisolver==1.4.8
# via matplotlib
matplotlib==3.10.0
# via ase
numpy==2.2.0
more-itertools==10.5.0
# via parsnip (pyproject.toml)
numpy==2.2.1
# via
# parsnip (pyproject.toml)
# ase
# contourpy
# matplotlib
# pycifrw
# scipy
packaging==24.2
# via
Expand All @@ -31,7 +34,11 @@ pillow==11.0.0
# via matplotlib
pluggy==1.5.0
# via pytest
pyparsing==3.2.0
ply==3.11
# via pycifrw
pycifrw==4.4.6
# via -r tests/requirements.in
pyparsing==3.2.1
# via matplotlib
pytest==8.3.4
# via -r tests/requirements.in
Expand Down
15 changes: 11 additions & 4 deletions .github/requirements-3.12.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This file was autogenerated by uv via the following command:
# uv pip compile --python-version=3.12 pyproject.toml tests/requirements.in
ase==3.23.0
ase==3.24.0
# via -r tests/requirements.in
contourpy==1.3.1
# via matplotlib
Expand All @@ -12,16 +12,19 @@ gemmi==0.7.0
# via -r tests/requirements.in
iniconfig==2.0.0
# via pytest
kiwisolver==1.4.7
kiwisolver==1.4.8
# via matplotlib
matplotlib==3.10.0
# via ase
numpy==2.2.0
more-itertools==10.5.0
# via parsnip (pyproject.toml)
numpy==2.2.1
# via
# parsnip (pyproject.toml)
# ase
# contourpy
# matplotlib
# pycifrw
# scipy
packaging==24.2
# via
Expand All @@ -31,7 +34,11 @@ pillow==11.0.0
# via matplotlib
pluggy==1.5.0
# via pytest
pyparsing==3.2.0
ply==3.11
# via pycifrw
pycifrw==4.4.6
# via -r tests/requirements.in
pyparsing==3.2.1
# via matplotlib
pytest==8.3.4
# via -r tests/requirements.in
Expand Down
15 changes: 11 additions & 4 deletions .github/requirements-3.13.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This file was autogenerated by uv via the following command:
# uv pip compile --python-version=3.13 pyproject.toml tests/requirements.in
ase==3.23.0
ase==3.24.0
# via -r tests/requirements.in
contourpy==1.3.1
# via matplotlib
Expand All @@ -12,16 +12,19 @@ gemmi==0.7.0
# via -r tests/requirements.in
iniconfig==2.0.0
# via pytest
kiwisolver==1.4.7
kiwisolver==1.4.8
# via matplotlib
matplotlib==3.10.0
# via ase
numpy==2.2.0
more-itertools==10.5.0
# via parsnip (pyproject.toml)
numpy==2.2.1
# via
# parsnip (pyproject.toml)
# ase
# contourpy
# matplotlib
# pycifrw
# scipy
packaging==24.2
# via
Expand All @@ -31,7 +34,11 @@ pillow==11.0.0
# via matplotlib
pluggy==1.5.0
# via pytest
pyparsing==3.2.0
ply==3.11
# via pycifrw
pycifrw==4.4.6
# via -r tests/requirements.in
pyparsing==3.2.1
# via matplotlib
pytest==8.3.4
# via -r tests/requirements.in
Expand Down
7 changes: 7 additions & 0 deletions .github/requirements-3.7.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,14 @@ kiwisolver==1.4.5
# via matplotlib
matplotlib==3.5.3
# via ase
more-itertools==9.1.0
# via parsnip (pyproject.toml)
numpy==1.21.6
# via
# parsnip (pyproject.toml)
# ase
# matplotlib
# pycifrw
# scipy
packaging==24.0
# via
Expand All @@ -34,6 +37,10 @@ pillow==9.5.0
# via matplotlib
pluggy==1.2.0
# via pytest
ply==3.11
# via pycifrw
pycifrw==4.4.6
# via -r tests/requirements.in
pyparsing==3.1.4
# via matplotlib
pytest==7.4.4
Expand Down
7 changes: 7 additions & 0 deletions .github/requirements-3.8.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,15 @@ kiwisolver==1.4.7
# via matplotlib
matplotlib==3.7.5
# via ase
more-itertools==10.5.0
# via parsnip (pyproject.toml)
numpy==1.24.4
# via
# parsnip (pyproject.toml)
# ase
# contourpy
# matplotlib
# pycifrw
# scipy
packaging==24.2
# via
Expand All @@ -35,6 +38,10 @@ pillow==10.4.0
# via matplotlib
pluggy==1.5.0
# via pytest
ply==3.11
# via pycifrw
pycifrw==4.4.6
# via -r tests/requirements.in
pyparsing==3.1.4
# via matplotlib
pytest==8.3.4
Expand Down
11 changes: 9 additions & 2 deletions .github/requirements-3.9.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This file was autogenerated by uv via the following command:
# uv pip compile --python-version=3.9 pyproject.toml tests/requirements.in
ase==3.23.0
ase==3.24.0
# via -r tests/requirements.in
contourpy==1.3.0
# via matplotlib
Expand All @@ -20,12 +20,15 @@ kiwisolver==1.4.7
# via matplotlib
matplotlib==3.9.4
# via ase
more-itertools==10.5.0
# via parsnip (pyproject.toml)
numpy==2.0.2
# via
# parsnip (pyproject.toml)
# ase
# contourpy
# matplotlib
# pycifrw
# scipy
packaging==24.2
# via
Expand All @@ -35,7 +38,11 @@ pillow==11.0.0
# via matplotlib
pluggy==1.5.0
# via pytest
pyparsing==3.2.0
ply==3.11
# via pycifrw
pycifrw==4.4.6
# via -r tests/requirements.in
pyparsing==3.2.1
# via matplotlib
pytest==8.3.4
# via -r tests/requirements.in
Expand Down
7 changes: 5 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,10 @@

.. _parse:

The ``parsnip.parse`` module handles standard CIF files (including those under the `CIF 1.1 <https://www.iucr.org/resources/cif/spec/version1.1>`_ and `CIF 2.0 <https://www.iucr.org/resources/cif/cif2>`_ standards), as well as many features from the `mmCIF <https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/beginner’s-guide-to-pdb-structures-and-the-pdbx-mmcif-format>`_ format.
The package includes a table reader for `loop\_`-delimited tables as well as a key-value pair reader. Provide a filename and a list of keys to either of these functions and you're all set to read start parsing CIF files!
Importing ``parsnip`` allows users to read `CIF 1.1 <https://www.iucr.org/resources/cif/spec/version1.1>`_ files, as well as many features from the `CIF 2.0 <https://www.iucr.org/resources/cif/cif2>`_ and `mmCIF <https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/beginner’s-guide-to-pdb-structures-and-the-pdbx-mmcif-format>`_ formats.
Creating a :class:`~.CifFile` object provides easy access to name-value :attr:`~.CifFile.pairs`, as well
as `loop\_`-delimited :attr:`~.CifFile.tables`. Data entries can be extracted as python primitives or
numpy arrays for further use.

.. _installing:

Expand Down Expand Up @@ -78,5 +80,6 @@ Dependencies
.. code:: text

numpy>=1.19
more-itertools

.. _contributing:
17 changes: 17 additions & 0 deletions architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Parsnip Architecture
--------------------

The primary design goal of ``parsnip`` was to create a lightweight, simple CIF parsing
library. ``parsnip`` has a minimal set of dependencies, relatively few lines of code,
and extensive testing to validate the accuracy of read files. Dozens of CIF parsing
libraries exist, but most are either (1) part of a much larger project (and therefore
undesirable as a simple dependency) or (2) have a poorly documented interface. This
project is designed to bridge that gap with careful documentation and a minimal subset
of features that build well into other open-source projects.


This project takes a (reasonably) permissive view of the CIF specification: data entries
that "look like" valid data will be parsed, regardless of file encoding, line length,
special characters, or syntax specifics like unlabeled blocks. ``parsnip`` is designed
to read and extract data from CIF, mmCIF, and STAR files regardless of their compliance
with the full CIF specification.
26 changes: 25 additions & 1 deletion changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,31 @@ Changelog
The format is based on `Keep a Changelog <http://keepachangelog.com/en/1.1.0/>`__.
This project adheres to `Semantic Versioning <http://semver.org/spec/v2.0.0.html>`__.

v0.x.x - 20xx-xx-xx
v1.0.0 - 20xx-xx-xx
-------------------

Added
~~~~~
- Support for nonsimple (';'-delimited) data entries.
- Improved support for entries containing special characters.
- Ability to query multiple keys or columns simultaneously.
- Additional tests for AMCSD and zeolite databases.
- Additional documentation and examples for the new interface

Changed
~~~~~~~
- Primary interface is now the ``CifFile`` object, which supports all previously implemented features in addition to several new methods.
- Files are now parsed lazily, and are traversed a single time.

Dependencies
~~~~~~~~~~~~
- Added ``more-itertools`` as a dependency for ``peekable`` iterators


v0.1.0 - 2024-12-20
-------------------

Added
~~~~~
- Unitcells module
- Function-based parsing interface for key and table reading
2 changes: 2 additions & 0 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

add_module_names = False

extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
Expand Down
15 changes: 4 additions & 11 deletions doc/source/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@ General Guidelines

All code contributed to **parsnip** must adhere to the following guidelines:

* Use a two branch model of development:

- Most new features and bug fixes should be developed in branches based on ``main``.
- API incompatible changes and those that significantly change existing functionality should be based on ``breaking``
* Hard dependencies (those that end users must install to use **parsnip**) are *strongly* discouraged, and should be avoided where possible. Numpy is the sole exception to this, as it is already included in most scientific computing software stacks.
* Additional dependencies required by developers (those used to run tests or build docs) are undesirable but allowed.
* Hard dependencies (those that end users must install to use **parsnip**) are *strongly* discouraged, and should be avoided where possible. Additional dependencies required by developers (those used to run tests or build docs) are allowed where necessary.
* All code should adhere to the source code conventions and satisfy the documentation and testing requirements discussed below.

As portability is a primary feature of **parsnip**, tests are run run on Python versions 3.9 and later. However, first class support should only be expected for versions covered by `NEP 29`_.
As portability is a primary feature of **parsnip**, tests are run run on Python versions 3.7 and later. However, first class support should only be expected for versions covered by `NEP 29`_.

.. _NEP 29: https://numpy.org/neps/nep-0029-deprecation_policy.html

Expand Down Expand Up @@ -57,15 +52,13 @@ All code in **parsnip** should be formatted using `ruff`_ via pre-commit. This p
Documentation
-------------

API documentation should be written as part of the docstrings of the package in the `Google style <https://google.github.io/styleguide/pyguide.html#383-functions-and-methods>`__.
API documentation should be written as part of the docstrings of the package in the `Numpy style <https://numpydoc.readthedocs.io/en/latest/format.html>`__.

Docstrings are automatically validated using `pydocstyle <http://www.pydocstyle.org/>`_ whenever the ruff pre-commit hooks are run.
The `official documentation <https://parsnip.readthedocs.io/>`_ is generated from the docstrings using `Sphinx <http://www.sphinx-doc.org/en/stable/index.html>`_.

In addition to API documentation, inline comments are strongly encouraged.
Code should be written as transparently as possible, so the primary goal of documentation should be explaining the algorithms or mathematical concepts underlying the code.
Avoid comments that simply restate the nature of lines of code: for example, the comment "set up regex pattern" is uninformative, since the code itself should make this obvious, *e.g*, ``re.compile(r"^(_\w+)\s+(\d+)")``.
On the other hand, the comment "read an underscore-prefixed word and a numeric value seperated by whitespace" is instructive.
Multiline comments for regex strings may sometimes be necessary.

Building Documentation
Expand All @@ -82,7 +75,7 @@ Building Documentation
Unit Tests
----------

All code should include a set of tests which test for correct behavior.
All code should include a set of tests which validate correct behavior.
All tests should be placed in the ``tests`` folder at the root of the project.
In general, most parts of parsnip primarily require `unit tests <https://en.wikipedia.org/wiki/Unit_testing>`_, but where appropriate `integration tests <https://en.wikipedia.org/wiki/Integration_testing>`_ are also welcome. Core functions should be tested against the sample CIF files included in ``tests/sample_data``.
Tests in **parsnip** use the `pytest <https://docs.pytest.org/>`__ testing framework.
Expand Down
10 changes: 10 additions & 0 deletions doc/source/example_file.cif
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,13 @@ _atom_site_Wyckoff_label
Cu1 0.0000000000 0.0000000000 0.0000000000 Cu a

_symmetry_space_group_name_H-M 'Fm-3m'

# Note that this table is only a subset of the full symmetry of the crystal, but
# it is sufficient to reconstruct the unit cell.
loop_
_symmetry_equiv_pos_site_id
_symmetry_equiv_pos_as_xyz
1 x,y,z
96 z,y+1/2,x+1/2
118 z+1/2,-y,x+1/2
192 z+1/2,y+1/2,x
Loading
Loading