-
Notifications
You must be signed in to change notification settings - Fork 0
Add ability to build out unit cells from CIF files #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
102 commits
Select commit
Hold shift + click to select a range
656e9e1
Add _str2num and _deg2rad _utils
janbridley 1e74eb7
Add cif file keys list to sample data
janbridley c369fd1
Add key_value_pairs reader and cell_params reader to parse
janbridley 672c4e3
Add tests for key reader
janbridley e0b693f
Add tests for new utils
janbridley 79350fc
Reorder test_key_reader
janbridley 04b3344
Improve documentation for regex
janbridley b59eab1
Add warnings and tests to read_key_value_pairs
janbridley 87303b9
Restore trailing spaces to downloaded CIF files
janbridley 90120c7
Properly track keys containing "-"
janbridley d4203da
Improved tests for key value pair reader
janbridley 8c3c014
Add key-value tests for INTENTIONALLY_BAD_CIF.cif
janbridley 9c91bde
Fix docs
janbridley 9aaba90
Enable top of page button
janbridley 6ea7882
Update brand primary colors
janbridley 0169783
Improve docs for parse.py
janbridley a404d19
Add __future__.annotations imports to relevant files
janbridley 4903f80
Fix typo
janbridley a333c5c
Seperate _errors from _templates
janbridley b0f386b
Clean up docstring return types
janbridley 96acd85
Add PDB cif to test suite
janbridley a6ebf33
Fix test in test_key_reader
janbridley f8dbaa3
Clean up patterns.py and add remove_nondelimiting_whitespace
janbridley b1e0bdd
Update table_reader to use remove_nondelimiting_whitespace
janbridley 51328be
Allow value reader to read mmCIF files
janbridley 06abb57
Update test_table_reader.py
janbridley 98a2201
Remove seperate mmCIF reader
janbridley 93909f8
Add docs for patterns module
janbridley d4d931b
Fix cast_to_float default value
janbridley 1d86db9
Update docs
janbridley 0528d36
Add documentation for __call__
janbridley 40c7fb8
Update regex_filter param documentation
janbridley 56c1e21
Fix typo
janbridley 853a166
Remove unneeded comment
janbridley 8b19268
Fix default values in docs
janbridley fd295a8
Fix typo
janbridley 3e5e77c
Minor doc fix
janbridley ffa59a7
Fix typo
janbridley 7f80005
Remove duplicate Introduction from index
janbridley 1e8c01d
Remove duplicate entries from toc
janbridley 56d80de
Add source for PDB cif
janbridley 5d47d10
Add mmCIF flag to read_cell_params
janbridley dfbf5ed
Add quickstart.rst
janbridley 28a7025
Fix comment in quickstart
janbridley e60cd1b
Remove unnecessary line in quickstart
janbridley 6e82566
Fix image path in README.rst
janbridley a772261
Update regex documentation
janbridley 7d03311
Fix CI
janbridley 1f05fd7
Update __init__.py
janbridley 0ddaa48
Add unitcells module
janbridley e1616ab
Add documentation links
janbridley 04b19a3
Fix doc file naming
janbridley 451fd8b
Remove resolved TODO
janbridley bba2849
Add top level description to unitcells
janbridley 8d243ad
Default regex filters to None
janbridley 5fece58
Fix default setting for nondelimiting_whitespace_replacement
janbridley 85b7dff
Remove outdated comment
janbridley f4aefa7
Fix tests
janbridley 43f0263
Add tests for symmetry operations
janbridley a713c06
Fix precision issues
janbridley a0a01cd
Increase string lines threshold
janbridley 8c7f2c0
Remove in-file tests
janbridley 8d5969d
Return unrounded values
janbridley 923ce2c
Add test_extract_unit_cell
janbridley f96c7a9
Add distance calculation util for uniqueness comparison
janbridley b124cac
Add function to build basis vector matrix from box
janbridley cbf17ba
Update unitcell builder and rename to extract_atomic_positions
janbridley 6ab5cd6
Update docstrings
janbridley e0eccb9
Clarify function naming
janbridley 9bb9b2f
Change space group for IncStrDb_Ccmm.cif to standard format
janbridley a19b068
Remove unnecessary transpose in basis vector function
janbridley b3803be
Update unitcell tests to use ase
janbridley bf532c6
Filter out ase warnings
janbridley 037d5d8
Switch catch_warnings to filterwarnings for python 3.9 compat
janbridley a1382d3
Add pytest to test requirements
janbridley b45218d
Fix top of page buttons and add view link
janbridley 2ab4288
Fix logo on README.rst
janbridley e4cbbb8
Merge remote-tracking branch 'origin/main' into feature/supercells
janbridley e35ed1b
Merge branch 'main' into feature/supercells
janbridley bdc832d
Add readable assertion error in unitcells.py
janbridley c30d23e
Improve assertion error in _safe_eval
janbridley 0632944
Remove unused distance-merge code
janbridley a023be2
Merge branch 'main' into feature/supercells
janbridley d65e094
Improve CI resilience
janbridley ccb13fd
Remove dependabot
janbridley e562353
Add requirements.txt for py3.6 and py3.7
janbridley d50b3ee
Update requirements.yaml CI action
janbridley 78fbd68
Remove temporary lines from ci
janbridley 9a2a5e7
Add future-annotations package
janbridley e0f186b
Fix annotations
janbridley 22b145f
Disable testing on py3.6
janbridley f046aa0
Update doc requirements
janbridley 3a3dbde
Decapitalize changelog and credits
janbridley e91767d
Swap tests to us UV
janbridley 77184f6
Fix uv version
janbridley f3ae2d1
Clean up CI script
janbridley b85b73b
Activate venv
janbridley 606be0b
Remove setup.py
janbridley a1be853
Fix CI
janbridley 446f441
Clean up CI
janbridley cefdb5c
Simplify CI
janbridley 7385f4b
Clean up overview
janbridley File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Add unitcells module
- Loading branch information
commit 0ddaa48871174d62b834ca292cb76505b1f5d3b2
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,200 @@ | ||
| """A.""" | ||
| from __future__ import annotations | ||
|
|
||
| import re | ||
| import warnings | ||
|
|
||
| import numpy as np | ||
|
|
||
| from parsnip._errors import ParseWarning | ||
| from parsnip._utils import _deg2rad | ||
| from parsnip.parse import read_key_value_pairs, read_table | ||
| from parsnip.patterns import cast_array_to_float | ||
|
|
||
|
|
||
| def read_fractional_positions( | ||
| filename: str, | ||
| regex_filter: tuple[tuple[str, str]] = ((r",\s+", ",")), | ||
| ): | ||
| r"""Extract the fractional X,Y,Z coordinates from a CIF file. | ||
|
|
||
| .. warning:: | ||
|
|
||
| This function ONLY returns the symmetry irreducible positions that are directly | ||
| stored in the CIF file. To build out the full unit cell, use | ||
| :meth:`extract_unit_cell`. | ||
|
|
||
| Args: | ||
| filename (str): The name of the .cif file to be parsed. | ||
| regex_filter (tuple[tuple[str]], optional): | ||
| A tuple of strings that are compiled to a regex filter and applied to each | ||
| data line. Default value = ``((r",\s+",","))`` | ||
|
|
||
| Returns: | ||
| :math:`(N, 3)` :class:`numpy.ndarray[np.float32]`: | ||
| Fractional X,Y,Z coordinates of the unit cell. | ||
| """ | ||
| xyz_keys = ("_atom_site_fract_x", "_atom_site_fract_y", "_atom_site_fract_z") | ||
| # Once #6 is added, we should warnings.catch_warnings(action="error") | ||
| xyz_data = read_table(filename=filename, keys=xyz_keys, regex_filter=regex_filter) | ||
|
|
||
| xyz_data = cast_array_to_float(arr=xyz_data, dtype=np.float32) | ||
|
|
||
| # Validate results | ||
| assert xyz_data.shape[1] == 3 | ||
| assert xyz_data.dtype == np.float32 | ||
|
|
||
| return xyz_data | ||
|
|
||
|
|
||
| def read_symmetry_operations(filename): | ||
| """TODO.""" | ||
| symmetry_keys = ( | ||
| "_symmetry_equiv_pos_as_xyz", | ||
| "_space_group_symop_operation_xyz", | ||
| ) | ||
| with warnings.catch_warnings(category=ParseWarning, action="ignore"): | ||
| # Only one of the two keys will be matched. We can safely ignore that warning. | ||
| data = read_table( | ||
| filename=filename, | ||
| keys=symmetry_keys, | ||
| # regex_filter=("'", ""), | ||
| nondelimiting_whitespace_replacement="", | ||
| ) | ||
|
|
||
| return data | ||
|
|
||
|
|
||
| def read_cell_params(filename, degrees: bool = True, mmcif: bool = False): | ||
| r"""Read the cell lengths and angles from a CIF file. | ||
|
|
||
| Args: | ||
| filename (str): The name of the .cif file to be parsed. | ||
| degrees (bool, optional): | ||
| When True, angles are returned in degrees (as per the cif spec). When False, | ||
| angles are converted to radians. | ||
| Default value = ``True`` | ||
| mmcif (bool, optional): | ||
| When False, the standard CIF key naming is used (e.g. _cell_angle_alpha). | ||
| When True, the mmCIF standard is used instead (e.g. cell.angle_alpha). | ||
| Default value = ``False`` | ||
|
|
||
| Returns: | ||
| tuple: | ||
| The box vector lengths and angles in degrees or radians | ||
| :math:`(L_1, L_2, L_3, \alpha, \beta, \gamma)`. | ||
| """ | ||
| if mmcif: | ||
| angle_keys = ("_cell.angle_alpha", "_cell.angle_beta", "_cell.angle_gamma") | ||
| box_keys = ("_cell.length_a", "_cell.length_b", "_cell.length_c") + angle_keys | ||
| else: | ||
| angle_keys = ("_cell_angle_alpha", "_cell_angle_beta", "_cell_angle_gamma") | ||
| box_keys = ("_cell_length_a", "_cell_length_b", "_cell_length_c") + angle_keys | ||
| cell_data = read_key_value_pairs(filename, keys=box_keys, only_read_numerics=True) | ||
|
|
||
| assert all(value is not None for value in cell_data.values()) | ||
| assert all(0 < cell_data[key] < 180 for key in angle_keys) | ||
janbridley marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| if not degrees: | ||
| for key in angle_keys: | ||
| cell_data[key] = _deg2rad(cell_data[key]) | ||
|
|
||
| return tuple(cell_data.values()) | ||
|
|
||
|
|
||
| def _safe_eval(str_input: str, x: int | float, y: int | float, z: int | float): | ||
| """Attempt to safely evaluate a string of symmetry equivalent positions. | ||
|
|
||
| Python's ``eval`` is notoriously unsafe. While we could evaluate the entire list at | ||
| once, doing so carries some risk. The typical alternative, ``ast.literal_eval``, | ||
| doesnot work because we need to evaluate mathematical operations. | ||
|
|
||
| We first replace the x,y,z values with ordered fstring inputs, to simplify the input | ||
| of fractional coordinate data. This is done for convenience more than security. | ||
|
|
||
| Once we substitute in the x,y,z values, we should have a string version of a list | ||
| containing only numerics and math operators. We apply a substitution to ensure this | ||
| is the case, then perform one final check. If it passes, we evaluate the list. Note | ||
| that __builtins__ is set to {}, meaning importing functions is not possible. The | ||
| __locals__ dict is also set to {}, so no variables are accessible in the evaluation. | ||
|
|
||
| I cannot guarantee this is fully safe, but it at the very least makes it extremely | ||
| difficult to do any funny business. | ||
|
|
||
| Args: | ||
| str_input (str): String to be evaluated. | ||
| x (int|float): Fractional coordinate in :math:`x`. | ||
| y (int|float): Fractional coordinate in :math:`y`. | ||
| z (int|float): Fractional coordinate in :math:`z`. | ||
|
|
||
| Returns: | ||
| list[list[int|float,int|float,int|float]]: | ||
| :math:`(N,3)` list of fractional coordinates. | ||
|
|
||
| """ | ||
| ordered_inputs = {"x": "{0}", "y": "{1}", "z": "{2}"} | ||
| # Replace any x, y, or z with the same character surrounded by curly braces. Then, | ||
| # perform substitutions to insert the actual values. | ||
| substituted_string = ( | ||
| re.sub(r"([xyz])", r"{\1}", str_input).format(**ordered_inputs).format(x, y, z) | ||
| ) | ||
| # Remove any unexpected characters from the string. | ||
| safe_string = re.sub(r"[^\d\[\]\,\+\-\/\*\.]", "", substituted_string) | ||
| # Double check to be sure: | ||
| assert all( | ||
| char in ",.0123456789+-/*[]" for char in safe_string | ||
| ), "Check that string only contains numerics or characters in { [],.+-/ }." | ||
janbridley marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
janbridley marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| return eval(safe_string, {"__builtins__": {}}, {}) # noqa: S307 | ||
|
|
||
|
|
||
| def extract_unit_cell(filename: str, n_decimal_places: int = 5): | ||
| """Return a complete unit cell from a .cif file in fractional coordinates. | ||
|
|
||
| Args: | ||
| filename (str): The name of the .cif file to be parsed. | ||
| n_decimal_places (int, optional): | ||
| The number of decimal places to round each position to for the uniqueness | ||
| comparison. Because CIF files only store limited precision, a relatively low | ||
| value is reccomended. 5 decimal places is usually enough to differentiate | ||
| every unique position. | ||
| Default value = ``5`` | ||
|
|
||
| Returns: | ||
| :math:`(N, 3)` :class:`numpy.ndarray[np.float32]`: | ||
| The full unit cell of the crystal structure. | ||
| """ | ||
| fractional_positions = read_fractional_positions(filename=filename) | ||
|
|
||
| symops = read_symmetry_operations(filename) | ||
| symops_str = np.array2string( | ||
| symops, | ||
| separator=",", # Place a comma after each line in the array. Required to eval | ||
| threshold=1024, # Ensure that every line is included in the string | ||
| floatmode="unique", # Ensures strings can uniquely represent each float number | ||
| ) | ||
|
|
||
| all_frac_positions = [_safe_eval(symops_str, *xyz) for xyz in fractional_positions] | ||
|
|
||
| pos = np.vstack(all_frac_positions) | ||
|
|
||
| # Wrap particles into the box | ||
| pos %= 1 | ||
|
|
||
| return np.unique(pos.round(n_decimal_places), axis=0) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| filename = "../tests/sample_data/AFLOW_mC24.cif" | ||
|
|
||
| fractional_positions = read_fractional_positions(filename=filename) | ||
| pos = extract_unit_cell(filename) | ||
| cell = read_cell_params(filename, degrees=False) | ||
| print(pos) | ||
|
|
||
| # TODO: test against pearson symbols. Do we get the correct number of atoms? And how | ||
| # much precision can we use for the uniqueness comparison? | ||
| # Also - is it helpful to map to known fractions? Probably not? | ||
|
|
||
| # Performance: about 4ms for CCDC file - 250 lines read, 8 XYZ positions * 48 sym | ||
| # This requires a unique ~200us, reading symopes ~120 ms, fractional positons 200 | ||
| # Total sum is around a ms: eval is likely slow? | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.