Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
656e9e1
Add _str2num and _deg2rad _utils
janbridley Apr 5, 2024
1e74eb7
Add cif file keys list to sample data
janbridley Apr 5, 2024
c369fd1
Add key_value_pairs reader and cell_params reader to parse
janbridley Apr 5, 2024
672c4e3
Add tests for key reader
janbridley Apr 5, 2024
e0b693f
Add tests for new utils
janbridley Apr 5, 2024
79350fc
Reorder test_key_reader
janbridley Apr 5, 2024
04b3344
Improve documentation for regex
janbridley Apr 5, 2024
b59eab1
Add warnings and tests to read_key_value_pairs
janbridley Apr 5, 2024
87303b9
Restore trailing spaces to downloaded CIF files
janbridley Apr 8, 2024
90120c7
Properly track keys containing "-"
janbridley Apr 8, 2024
d4203da
Improved tests for key value pair reader
janbridley Apr 8, 2024
8c3c014
Add key-value tests for INTENTIONALLY_BAD_CIF.cif
janbridley Apr 8, 2024
9c91bde
Fix docs
janbridley Apr 8, 2024
9aaba90
Enable top of page button
janbridley Apr 8, 2024
6ea7882
Update brand primary colors
janbridley Apr 8, 2024
0169783
Improve docs for parse.py
janbridley Apr 8, 2024
a404d19
Add __future__.annotations imports to relevant files
janbridley Apr 9, 2024
4903f80
Fix typo
janbridley Apr 10, 2024
a333c5c
Seperate _errors from _templates
janbridley Apr 10, 2024
b0f386b
Clean up docstring return types
janbridley Apr 10, 2024
96acd85
Add PDB cif to test suite
janbridley Apr 10, 2024
a6ebf33
Fix test in test_key_reader
janbridley Apr 10, 2024
f8dbaa3
Clean up patterns.py and add remove_nondelimiting_whitespace
janbridley Apr 10, 2024
b1e0bdd
Update table_reader to use remove_nondelimiting_whitespace
janbridley Apr 10, 2024
51328be
Allow value reader to read mmCIF files
janbridley Apr 10, 2024
06abb57
Update test_table_reader.py
janbridley Apr 10, 2024
98a2201
Remove seperate mmCIF reader
janbridley Apr 10, 2024
93909f8
Add docs for patterns module
janbridley Apr 10, 2024
d4d931b
Fix cast_to_float default value
janbridley Apr 10, 2024
1d86db9
Update docs
janbridley Apr 10, 2024
0528d36
Add documentation for __call__
janbridley Apr 10, 2024
40c7fb8
Update regex_filter param documentation
janbridley Apr 10, 2024
56c1e21
Fix typo
janbridley Apr 10, 2024
853a166
Remove unneeded comment
janbridley Apr 10, 2024
8b19268
Fix default values in docs
janbridley Apr 10, 2024
fd295a8
Fix typo
janbridley Apr 10, 2024
3e5e77c
Minor doc fix
janbridley Apr 10, 2024
ffa59a7
Fix typo
janbridley Apr 10, 2024
7f80005
Remove duplicate Introduction from index
janbridley Apr 10, 2024
1e8c01d
Remove duplicate entries from toc
janbridley Apr 10, 2024
56d80de
Add source for PDB cif
janbridley Apr 10, 2024
5d47d10
Add mmCIF flag to read_cell_params
janbridley Apr 10, 2024
dfbf5ed
Add quickstart.rst
janbridley Apr 10, 2024
28a7025
Fix comment in quickstart
janbridley Apr 10, 2024
e60cd1b
Remove unnecessary line in quickstart
janbridley Apr 10, 2024
6e82566
Fix image path in README.rst
janbridley Apr 11, 2024
a772261
Update regex documentation
janbridley Apr 11, 2024
7d03311
Fix CI
janbridley Apr 11, 2024
1f05fd7
Update __init__.py
janbridley Apr 15, 2024
0ddaa48
Add unitcells module
janbridley Apr 15, 2024
e1616ab
Add documentation links
janbridley Apr 15, 2024
04b19a3
Fix doc file naming
janbridley Apr 15, 2024
451fd8b
Remove resolved TODO
janbridley Apr 15, 2024
bba2849
Add top level description to unitcells
janbridley Apr 15, 2024
8d243ad
Default regex filters to None
janbridley Apr 15, 2024
5fece58
Fix default setting for nondelimiting_whitespace_replacement
janbridley Apr 15, 2024
85b7dff
Remove outdated comment
janbridley Apr 15, 2024
f4aefa7
Fix tests
janbridley Apr 15, 2024
43f0263
Add tests for symmetry operations
janbridley Apr 15, 2024
a713c06
Fix precision issues
janbridley Apr 15, 2024
a0a01cd
Increase string lines threshold
janbridley Apr 15, 2024
8c7f2c0
Remove in-file tests
janbridley Apr 15, 2024
8d5969d
Return unrounded values
janbridley Apr 15, 2024
923ce2c
Add test_extract_unit_cell
janbridley Apr 15, 2024
f96c7a9
Add distance calculation util for uniqueness comparison
janbridley Apr 26, 2024
b124cac
Add function to build basis vector matrix from box
janbridley Apr 26, 2024
cbf17ba
Update unitcell builder and rename to extract_atomic_positions
janbridley Apr 26, 2024
6ab5cd6
Update docstrings
janbridley Apr 26, 2024
e0eccb9
Clarify function naming
janbridley Apr 26, 2024
9bb9b2f
Change space group for IncStrDb_Ccmm.cif to standard format
janbridley Apr 26, 2024
a19b068
Remove unnecessary transpose in basis vector function
janbridley Apr 26, 2024
b3803be
Update unitcell tests to use ase
janbridley Apr 26, 2024
bf532c6
Filter out ase warnings
janbridley Apr 26, 2024
037d5d8
Switch catch_warnings to filterwarnings for python 3.9 compat
janbridley Apr 26, 2024
a1382d3
Add pytest to test requirements
janbridley Apr 27, 2024
b45218d
Fix top of page buttons and add view link
janbridley May 10, 2024
2ab4288
Fix logo on README.rst
janbridley May 10, 2024
e4cbbb8
Merge remote-tracking branch 'origin/main' into feature/supercells
janbridley May 22, 2024
e35ed1b
Merge branch 'main' into feature/supercells
janbridley Jun 3, 2024
bdc832d
Add readable assertion error in unitcells.py
janbridley Jun 10, 2024
c30d23e
Improve assertion error in _safe_eval
janbridley Jun 10, 2024
0632944
Remove unused distance-merge code
janbridley Jun 10, 2024
a023be2
Merge branch 'main' into feature/supercells
janbridley Dec 19, 2024
d65e094
Improve CI resilience
janbridley Dec 19, 2024
ccb13fd
Remove dependabot
janbridley Dec 19, 2024
e562353
Add requirements.txt for py3.6 and py3.7
janbridley Dec 19, 2024
d50b3ee
Update requirements.yaml CI action
janbridley Dec 19, 2024
78fbd68
Remove temporary lines from ci
janbridley Dec 19, 2024
9a2a5e7
Add future-annotations package
janbridley Dec 19, 2024
e0f186b
Fix annotations
janbridley Dec 19, 2024
22b145f
Disable testing on py3.6
janbridley Dec 20, 2024
f046aa0
Update doc requirements
janbridley Dec 20, 2024
3a3dbde
Decapitalize changelog and credits
janbridley Dec 20, 2024
e91767d
Swap tests to us UV
janbridley Dec 20, 2024
77184f6
Fix uv version
janbridley Dec 20, 2024
f3ae2d1
Clean up CI script
janbridley Dec 20, 2024
b85b73b
Activate venv
janbridley Dec 20, 2024
606be0b
Remove setup.py
janbridley Dec 20, 2024
a1be853
Fix CI
janbridley Dec 20, 2024
446f441
Clean up CI
janbridley Dec 20, 2024
cefdb5c
Simplify CI
janbridley Dec 20, 2024
7385f4b
Clean up overview
janbridley Dec 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add unitcells module
  • Loading branch information
janbridley committed Apr 15, 2024
commit 0ddaa48871174d62b834ca292cb76505b1f5d3b2
68 changes: 1 addition & 67 deletions parsnip/parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
import numpy as np

from ._errors import ParseError, ParseWarning
from ._utils import _deg2rad, _str2num
from ._utils import _str2num
from .patterns import LineCleaner, cast_array_to_float, remove_nondelimiting_whitespace


Expand Down Expand Up @@ -314,69 +314,3 @@ def read_key_value_pairs(
)

return data


def read_cell_params(filename, degrees: bool = True, mmcif: bool = False):
r"""Read the cell lengths and angles from a CIF file.

Args:
filename (str): The name of the .cif file to be parsed.
degrees (bool, optional):
When True, angles are returned in degrees (as per the cif spec). When False,
angles are converted to radians.
Default value = ``True``
mmcif (bool, optional):
When False, the standard CIF key naming is used (e.g. _cell_angle_alpha).
When True, the mmCIF standard is used instead (e.g. cell.angle_alpha).
Default value = ``False``

Returns:
tuple:
The box vector lengths and angles in degrees or radians
:math:`(L_1, L_2, L_3, \alpha, \beta, \gamma)`.
"""
if mmcif:
angle_keys = ("_cell.angle_alpha", "_cell.angle_beta", "_cell.angle_gamma")
box_keys = ("_cell.length_a", "_cell.length_b", "_cell.length_c") + angle_keys
else:
angle_keys = ("_cell_angle_alpha", "_cell_angle_beta", "_cell_angle_gamma")
box_keys = ("_cell_length_a", "_cell_length_b", "_cell_length_c") + angle_keys
cell_data = read_key_value_pairs(filename, keys=box_keys, only_read_numerics=True)

assert all(value is not None for value in cell_data.values())
assert all(0 < cell_data[key] < 180 for key in angle_keys)

if not degrees:
for key in angle_keys:
cell_data[key] = _deg2rad(cell_data[key])

return tuple(cell_data.values())


def read_fractional_positions(
filename: str,
regex_filter: tuple[tuple[str, str]] = ((r",\s+", ",")),
):
r"""Extract the fractional X,Y,Z coordinates from a CIF file.

Args:
filename (str): The name of the .cif file to be parsed.
regex_filter (tuple[tuple[str]], optional):
A tuple of strings that are compiled to a regex filter and applied to each
data line. Default value = ``((r",\s+",","))``

Returns:
:math:`(N, 3)` :class:`numpy.ndarray[np.float32]`:
Fractional X,Y,Z coordinates of the unit cell.
"""
xyz_keys = ("_atom_site_fract_x", "_atom_site_fract_y", "_atom_site_fract_z")
# Once #6 is added, we should warnings.catch_warnings(action="error")
xyz_data = read_table(filename=filename, keys=xyz_keys, regex_filter=regex_filter)

xyz_data = cast_array_to_float(arr=xyz_data, dtype=np.float32)

# Validate results
assert xyz_data.shape[1] == 3
assert xyz_data.dtype == np.float32

return xyz_data
200 changes: 200 additions & 0 deletions parsnip/unitcells.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
"""A."""
from __future__ import annotations

import re
import warnings

import numpy as np

from parsnip._errors import ParseWarning
from parsnip._utils import _deg2rad
from parsnip.parse import read_key_value_pairs, read_table
from parsnip.patterns import cast_array_to_float


def read_fractional_positions(
filename: str,
regex_filter: tuple[tuple[str, str]] = ((r",\s+", ",")),
):
r"""Extract the fractional X,Y,Z coordinates from a CIF file.

.. warning::

This function ONLY returns the symmetry irreducible positions that are directly
stored in the CIF file. To build out the full unit cell, use
:meth:`extract_unit_cell`.

Args:
filename (str): The name of the .cif file to be parsed.
regex_filter (tuple[tuple[str]], optional):
A tuple of strings that are compiled to a regex filter and applied to each
data line. Default value = ``((r",\s+",","))``

Returns:
:math:`(N, 3)` :class:`numpy.ndarray[np.float32]`:
Fractional X,Y,Z coordinates of the unit cell.
"""
xyz_keys = ("_atom_site_fract_x", "_atom_site_fract_y", "_atom_site_fract_z")
# Once #6 is added, we should warnings.catch_warnings(action="error")
xyz_data = read_table(filename=filename, keys=xyz_keys, regex_filter=regex_filter)

xyz_data = cast_array_to_float(arr=xyz_data, dtype=np.float32)

# Validate results
assert xyz_data.shape[1] == 3
assert xyz_data.dtype == np.float32

return xyz_data


def read_symmetry_operations(filename):
"""TODO."""
symmetry_keys = (
"_symmetry_equiv_pos_as_xyz",
"_space_group_symop_operation_xyz",
)
with warnings.catch_warnings(category=ParseWarning, action="ignore"):
# Only one of the two keys will be matched. We can safely ignore that warning.
data = read_table(
filename=filename,
keys=symmetry_keys,
# regex_filter=("'", ""),
nondelimiting_whitespace_replacement="",
)

return data


def read_cell_params(filename, degrees: bool = True, mmcif: bool = False):
r"""Read the cell lengths and angles from a CIF file.

Args:
filename (str): The name of the .cif file to be parsed.
degrees (bool, optional):
When True, angles are returned in degrees (as per the cif spec). When False,
angles are converted to radians.
Default value = ``True``
mmcif (bool, optional):
When False, the standard CIF key naming is used (e.g. _cell_angle_alpha).
When True, the mmCIF standard is used instead (e.g. cell.angle_alpha).
Default value = ``False``

Returns:
tuple:
The box vector lengths and angles in degrees or radians
:math:`(L_1, L_2, L_3, \alpha, \beta, \gamma)`.
"""
if mmcif:
angle_keys = ("_cell.angle_alpha", "_cell.angle_beta", "_cell.angle_gamma")
box_keys = ("_cell.length_a", "_cell.length_b", "_cell.length_c") + angle_keys
else:
angle_keys = ("_cell_angle_alpha", "_cell_angle_beta", "_cell_angle_gamma")
box_keys = ("_cell_length_a", "_cell_length_b", "_cell_length_c") + angle_keys
cell_data = read_key_value_pairs(filename, keys=box_keys, only_read_numerics=True)

assert all(value is not None for value in cell_data.values())
assert all(0 < cell_data[key] < 180 for key in angle_keys)

if not degrees:
for key in angle_keys:
cell_data[key] = _deg2rad(cell_data[key])

return tuple(cell_data.values())


def _safe_eval(str_input: str, x: int | float, y: int | float, z: int | float):
"""Attempt to safely evaluate a string of symmetry equivalent positions.

Python's ``eval`` is notoriously unsafe. While we could evaluate the entire list at
once, doing so carries some risk. The typical alternative, ``ast.literal_eval``,
doesnot work because we need to evaluate mathematical operations.

We first replace the x,y,z values with ordered fstring inputs, to simplify the input
of fractional coordinate data. This is done for convenience more than security.

Once we substitute in the x,y,z values, we should have a string version of a list
containing only numerics and math operators. We apply a substitution to ensure this
is the case, then perform one final check. If it passes, we evaluate the list. Note
that __builtins__ is set to {}, meaning importing functions is not possible. The
__locals__ dict is also set to {}, so no variables are accessible in the evaluation.

I cannot guarantee this is fully safe, but it at the very least makes it extremely
difficult to do any funny business.

Args:
str_input (str): String to be evaluated.
x (int|float): Fractional coordinate in :math:`x`.
y (int|float): Fractional coordinate in :math:`y`.
z (int|float): Fractional coordinate in :math:`z`.

Returns:
list[list[int|float,int|float,int|float]]:
:math:`(N,3)` list of fractional coordinates.

"""
ordered_inputs = {"x": "{0}", "y": "{1}", "z": "{2}"}
# Replace any x, y, or z with the same character surrounded by curly braces. Then,
# perform substitutions to insert the actual values.
substituted_string = (
re.sub(r"([xyz])", r"{\1}", str_input).format(**ordered_inputs).format(x, y, z)
)
# Remove any unexpected characters from the string.
safe_string = re.sub(r"[^\d\[\]\,\+\-\/\*\.]", "", substituted_string)
# Double check to be sure:
assert all(
char in ",.0123456789+-/*[]" for char in safe_string
), "Check that string only contains numerics or characters in { [],.+-/ }."
return eval(safe_string, {"__builtins__": {}}, {}) # noqa: S307


def extract_unit_cell(filename: str, n_decimal_places: int = 5):
"""Return a complete unit cell from a .cif file in fractional coordinates.

Args:
filename (str): The name of the .cif file to be parsed.
n_decimal_places (int, optional):
The number of decimal places to round each position to for the uniqueness
comparison. Because CIF files only store limited precision, a relatively low
value is reccomended. 5 decimal places is usually enough to differentiate
every unique position.
Default value = ``5``

Returns:
:math:`(N, 3)` :class:`numpy.ndarray[np.float32]`:
The full unit cell of the crystal structure.
"""
fractional_positions = read_fractional_positions(filename=filename)

symops = read_symmetry_operations(filename)
symops_str = np.array2string(
symops,
separator=",", # Place a comma after each line in the array. Required to eval
threshold=1024, # Ensure that every line is included in the string
floatmode="unique", # Ensures strings can uniquely represent each float number
)

all_frac_positions = [_safe_eval(symops_str, *xyz) for xyz in fractional_positions]

pos = np.vstack(all_frac_positions)

# Wrap particles into the box
pos %= 1

return np.unique(pos.round(n_decimal_places), axis=0)


if __name__ == "__main__":
filename = "../tests/sample_data/AFLOW_mC24.cif"

fractional_positions = read_fractional_positions(filename=filename)
pos = extract_unit_cell(filename)
cell = read_cell_params(filename, degrees=False)
print(pos)

# TODO: test against pearson symbols. Do we get the correct number of atoms? And how
# much precision can we use for the uniqueness comparison?
# Also - is it helpful to map to known fractions? Probably not?

# Performance: about 4ms for CCDC file - 250 lines read, 8 XYZ positions * 48 sym
# This requires a unique ~200us, reading symopes ~120 ms, fractional positons 200
# Total sum is around a ms: eval is likely slow?