Skip to content

Commit a01ea2e

Browse files
authored
Refactor to object-oriented interface (#29)
* Nearly-working impl * Full working example * Clean up layout * Further cleanup * Lint OO * Run pre-commit on _errors.py * Add oo.py temp implementation * Undo changes to sample data * Lint oo.py * Remove change to sample data * Add oo to init.py * Handle edge cases * Test parsing real files * Improve robustness of table reader * Lint oo and conftest * Clean up text and remove comments * Port initial test to new style * Port remaining key tests * Minor fixes * Clean up test_key_reader.py * Progress toward transition to recarray * Increase tests and fix memory layout bug * Fixes to memory layout * Convert table_reader tests * Linting and doc fixes * Clean up docs * Finish porting tests * Lints * Fix for scalar array inputs * Remove unnecessary filterwarning * Expand on tests * Clean up unitcells * Lint tests * Finalize lints * Restructure patterns * Update test_patterns * Lint and clean up * Final lint * Improve a few tests * Add symops to example cif * Remove package-unitcells deprecated docs * Fix link in package-parse * Update quickstart tutorial * Move oo.py to parsnip.py * Update README * Update Unitcells test imports * Lint * Lazily load file * Remove unused files * Skip bad_cif test * Clean up tests * Lint * Add tests for table_labels and cast_numerics * Clean up tests * Lint * Clean up docstrings * Lint and update docstrings * Further docs * Codespell * Tests for cell * Lint * Update errors for read_unit_cell * Clean up tests and todos * More TODOs * Lint * Add test for cell property * Remove modindex from sidebar * Consolidate logic for nonsimple data * Lint * Fix type annotation in cast_array function * Add more-itertools as official dependency * Clean up dependency documentation * Add index for ase backward compatibility * Change wording in development.rst * Replace index specification * Disable ASE test on python3.7 * Fix version check * Add additional lints * Document additional rules in pyproject.toml * Move PATTERNS dict to end of docs * Clean up development.rst * Expand with tests from additional databases * Disable lint that causes warning * Fix for multiline data entries * Progress toward multiline string parsing * Working impl that fails for blocks containing a semicolon * Clean up * Messy working impl * Clean up * Retain newlines * Lint * Add TODO * Add missing multiline keys * Wrap accumulator into a function * Clean up _accumulate_nonsimple_data * Clean up unused comments * Update changelog.rst * Fix version headings in changelog * Update README to reflect correct CIF2.0 status * Add CIFTEST data to gitignore * Escape dash in regex and allow forward slash in data name * Swap namedtuple to dataclass and clean up provided keys * Auto detect cif keys * Allow pdb matrix keys * Generalize nonsimple data delimiters * Add architecture.md * Update table tests and fix regex for nonsimple data in tabs * Add pycifrw to test reqs * Lint tests * Verify all table content * Lint * import annotations * Remove unused pattern * Rename tables to loops * Remove extra character from regex * Clean up table reader * Clean up
1 parent 527ae55 commit a01ea2e

39 files changed

+1803
-1048
lines changed

.github/requirements-3.10.txt

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# This file was autogenerated by uv via the following command:
22
# uv pip compile --python-version=3.10 pyproject.toml tests/requirements.in
3-
ase==3.23.0
3+
ase==3.24.0
44
# via -r tests/requirements.in
55
contourpy==1.3.1
66
# via matplotlib
@@ -14,16 +14,19 @@ gemmi==0.7.0
1414
# via -r tests/requirements.in
1515
iniconfig==2.0.0
1616
# via pytest
17-
kiwisolver==1.4.7
17+
kiwisolver==1.4.8
1818
# via matplotlib
1919
matplotlib==3.10.0
2020
# via ase
21-
numpy==2.2.0
21+
more-itertools==10.5.0
22+
# via parsnip (pyproject.toml)
23+
numpy==2.2.1
2224
# via
2325
# parsnip (pyproject.toml)
2426
# ase
2527
# contourpy
2628
# matplotlib
29+
# pycifrw
2730
# scipy
2831
packaging==24.2
2932
# via
@@ -33,7 +36,11 @@ pillow==11.0.0
3336
# via matplotlib
3437
pluggy==1.5.0
3538
# via pytest
36-
pyparsing==3.2.0
39+
ply==3.11
40+
# via pycifrw
41+
pycifrw==4.4.6
42+
# via -r tests/requirements.in
43+
pyparsing==3.2.1
3744
# via matplotlib
3845
pytest==8.3.4
3946
# via -r tests/requirements.in

.github/requirements-3.11.txt

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# This file was autogenerated by uv via the following command:
22
# uv pip compile --python-version=3.11 pyproject.toml tests/requirements.in
3-
ase==3.23.0
3+
ase==3.24.0
44
# via -r tests/requirements.in
55
contourpy==1.3.1
66
# via matplotlib
@@ -12,16 +12,19 @@ gemmi==0.7.0
1212
# via -r tests/requirements.in
1313
iniconfig==2.0.0
1414
# via pytest
15-
kiwisolver==1.4.7
15+
kiwisolver==1.4.8
1616
# via matplotlib
1717
matplotlib==3.10.0
1818
# via ase
19-
numpy==2.2.0
19+
more-itertools==10.5.0
20+
# via parsnip (pyproject.toml)
21+
numpy==2.2.1
2022
# via
2123
# parsnip (pyproject.toml)
2224
# ase
2325
# contourpy
2426
# matplotlib
27+
# pycifrw
2528
# scipy
2629
packaging==24.2
2730
# via
@@ -31,7 +34,11 @@ pillow==11.0.0
3134
# via matplotlib
3235
pluggy==1.5.0
3336
# via pytest
34-
pyparsing==3.2.0
37+
ply==3.11
38+
# via pycifrw
39+
pycifrw==4.4.6
40+
# via -r tests/requirements.in
41+
pyparsing==3.2.1
3542
# via matplotlib
3643
pytest==8.3.4
3744
# via -r tests/requirements.in

.github/requirements-3.12.txt

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# This file was autogenerated by uv via the following command:
22
# uv pip compile --python-version=3.12 pyproject.toml tests/requirements.in
3-
ase==3.23.0
3+
ase==3.24.0
44
# via -r tests/requirements.in
55
contourpy==1.3.1
66
# via matplotlib
@@ -12,16 +12,19 @@ gemmi==0.7.0
1212
# via -r tests/requirements.in
1313
iniconfig==2.0.0
1414
# via pytest
15-
kiwisolver==1.4.7
15+
kiwisolver==1.4.8
1616
# via matplotlib
1717
matplotlib==3.10.0
1818
# via ase
19-
numpy==2.2.0
19+
more-itertools==10.5.0
20+
# via parsnip (pyproject.toml)
21+
numpy==2.2.1
2022
# via
2123
# parsnip (pyproject.toml)
2224
# ase
2325
# contourpy
2426
# matplotlib
27+
# pycifrw
2528
# scipy
2629
packaging==24.2
2730
# via
@@ -31,7 +34,11 @@ pillow==11.0.0
3134
# via matplotlib
3235
pluggy==1.5.0
3336
# via pytest
34-
pyparsing==3.2.0
37+
ply==3.11
38+
# via pycifrw
39+
pycifrw==4.4.6
40+
# via -r tests/requirements.in
41+
pyparsing==3.2.1
3542
# via matplotlib
3643
pytest==8.3.4
3744
# via -r tests/requirements.in

.github/requirements-3.13.txt

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# This file was autogenerated by uv via the following command:
22
# uv pip compile --python-version=3.13 pyproject.toml tests/requirements.in
3-
ase==3.23.0
3+
ase==3.24.0
44
# via -r tests/requirements.in
55
contourpy==1.3.1
66
# via matplotlib
@@ -12,16 +12,19 @@ gemmi==0.7.0
1212
# via -r tests/requirements.in
1313
iniconfig==2.0.0
1414
# via pytest
15-
kiwisolver==1.4.7
15+
kiwisolver==1.4.8
1616
# via matplotlib
1717
matplotlib==3.10.0
1818
# via ase
19-
numpy==2.2.0
19+
more-itertools==10.5.0
20+
# via parsnip (pyproject.toml)
21+
numpy==2.2.1
2022
# via
2123
# parsnip (pyproject.toml)
2224
# ase
2325
# contourpy
2426
# matplotlib
27+
# pycifrw
2528
# scipy
2629
packaging==24.2
2730
# via
@@ -31,7 +34,11 @@ pillow==11.0.0
3134
# via matplotlib
3235
pluggy==1.5.0
3336
# via pytest
34-
pyparsing==3.2.0
37+
ply==3.11
38+
# via pycifrw
39+
pycifrw==4.4.6
40+
# via -r tests/requirements.in
41+
pyparsing==3.2.1
3542
# via matplotlib
3643
pytest==8.3.4
3744
# via -r tests/requirements.in

.github/requirements-3.7.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,14 @@ kiwisolver==1.4.5
2020
# via matplotlib
2121
matplotlib==3.5.3
2222
# via ase
23+
more-itertools==9.1.0
24+
# via parsnip (pyproject.toml)
2325
numpy==1.21.6
2426
# via
2527
# parsnip (pyproject.toml)
2628
# ase
2729
# matplotlib
30+
# pycifrw
2831
# scipy
2932
packaging==24.0
3033
# via
@@ -34,6 +37,10 @@ pillow==9.5.0
3437
# via matplotlib
3538
pluggy==1.2.0
3639
# via pytest
40+
ply==3.11
41+
# via pycifrw
42+
pycifrw==4.4.6
43+
# via -r tests/requirements.in
3744
pyparsing==3.1.4
3845
# via matplotlib
3946
pytest==7.4.4

.github/requirements-3.8.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,15 @@ kiwisolver==1.4.7
2020
# via matplotlib
2121
matplotlib==3.7.5
2222
# via ase
23+
more-itertools==10.5.0
24+
# via parsnip (pyproject.toml)
2325
numpy==1.24.4
2426
# via
2527
# parsnip (pyproject.toml)
2628
# ase
2729
# contourpy
2830
# matplotlib
31+
# pycifrw
2932
# scipy
3033
packaging==24.2
3134
# via
@@ -35,6 +38,10 @@ pillow==10.4.0
3538
# via matplotlib
3639
pluggy==1.5.0
3740
# via pytest
41+
ply==3.11
42+
# via pycifrw
43+
pycifrw==4.4.6
44+
# via -r tests/requirements.in
3845
pyparsing==3.1.4
3946
# via matplotlib
4047
pytest==8.3.4

.github/requirements-3.9.txt

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# This file was autogenerated by uv via the following command:
22
# uv pip compile --python-version=3.9 pyproject.toml tests/requirements.in
3-
ase==3.23.0
3+
ase==3.24.0
44
# via -r tests/requirements.in
55
contourpy==1.3.0
66
# via matplotlib
@@ -20,12 +20,15 @@ kiwisolver==1.4.7
2020
# via matplotlib
2121
matplotlib==3.9.4
2222
# via ase
23+
more-itertools==10.5.0
24+
# via parsnip (pyproject.toml)
2325
numpy==2.0.2
2426
# via
2527
# parsnip (pyproject.toml)
2628
# ase
2729
# contourpy
2830
# matplotlib
31+
# pycifrw
2932
# scipy
3033
packaging==24.2
3134
# via
@@ -35,7 +38,11 @@ pillow==11.0.0
3538
# via matplotlib
3639
pluggy==1.5.0
3740
# via pytest
38-
pyparsing==3.2.0
41+
ply==3.11
42+
# via pycifrw
43+
pycifrw==4.4.6
44+
# via -r tests/requirements.in
45+
pyparsing==3.2.1
3946
# via matplotlib
4047
pytest==8.3.4
4148
# via -r tests/requirements.in

README.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,10 @@
2727

2828
.. _parse:
2929

30-
The ``parsnip.parse`` module handles standard CIF files (including those under the `CIF 1.1 <https://www.iucr.org/resources/cif/spec/version1.1>`_ and `CIF 2.0 <https://www.iucr.org/resources/cif/cif2>`_ standards), as well as many features from the `mmCIF <https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/beginner’s-guide-to-pdb-structures-and-the-pdbx-mmcif-format>`_ format.
31-
The package includes a table reader for `loop\_`-delimited tables as well as a key-value pair reader. Provide a filename and a list of keys to either of these functions and you're all set to read start parsing CIF files!
30+
Importing ``parsnip`` allows users to read `CIF 1.1 <https://www.iucr.org/resources/cif/spec/version1.1>`_ files, as well as many features from the `CIF 2.0 <https://www.iucr.org/resources/cif/cif2>`_ and `mmCIF <https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/beginner’s-guide-to-pdb-structures-and-the-pdbx-mmcif-format>`_ formats.
31+
Creating a :class:`~.CifFile` object provides easy access to name-value :attr:`~.CifFile.pairs`, as well
32+
as `loop\_`-delimited :attr:`~.CifFile.tables`. Data entries can be extracted as python primitives or
33+
numpy arrays for further use.
3234

3335
.. _installing:
3436

@@ -78,5 +80,6 @@ Dependencies
7880
.. code:: text
7981
8082
numpy>=1.19
83+
more-itertools
8184
8285
.. _contributing:

architecture.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Parsnip Architecture
2+
--------------------
3+
4+
The primary design goal of ``parsnip`` was to create a lightweight, simple CIF parsing
5+
library. ``parsnip`` has a minimal set of dependencies, relatively few lines of code,
6+
and extensive testing to validate the accuracy of read files. Dozens of CIF parsing
7+
libraries exist, but most are either (1) part of a much larger project (and therefore
8+
undesirable as a simple dependency) or (2) have a poorly documented interface. This
9+
project is designed to bridge that gap with careful documentation and a minimal subset
10+
of features that build well into other open-source projects.
11+
12+
13+
This project takes a (reasonably) permissive view of the CIF specification: data entries
14+
that "look like" valid data will be parsed, regardless of file encoding, line length,
15+
special characters, or syntax specifics like unlabeled blocks. ``parsnip`` is designed
16+
to read and extract data from CIF, mmCIF, and STAR files regardless of their compliance
17+
with the full CIF specification.

changelog.rst

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,31 @@ Changelog
44
The format is based on `Keep a Changelog <http://keepachangelog.com/en/1.1.0/>`__.
55
This project adheres to `Semantic Versioning <http://semver.org/spec/v2.0.0.html>`__.
66

7-
v0.x.x - 20xx-xx-xx
7+
v1.0.0 - 20xx-xx-xx
8+
-------------------
89

910
Added
1011
~~~~~
12+
- Support for nonsimple (';'-delimited) data entries.
13+
- Improved support for entries containing special characters.
14+
- Ability to query multiple keys or columns simultaneously.
15+
- Additional tests for AMCSD and zeolite databases.
16+
- Additional documentation and examples for the new interface
17+
18+
Changed
19+
~~~~~~~
20+
- Primary interface is now the ``CifFile`` object, which supports all previously implemented features in addition to several new methods.
21+
- Files are now parsed lazily, and are traversed a single time.
22+
23+
Dependencies
24+
~~~~~~~~~~~~
25+
- Added ``more-itertools`` as a dependency for ``peekable`` iterators
26+
27+
28+
v0.1.0 - 2024-12-20
29+
-------------------
30+
31+
Added
32+
~~~~~
33+
- Unitcells module
34+
- Function-based parsing interface for key and table reading

0 commit comments

Comments
 (0)