-
Notifications
You must be signed in to change notification settings - Fork 16
segy_to_mdio_v1 #577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
segy_to_mdio_v1 #577
Changes from all commits
Commits
Show all changes
91 commits
Select commit
Hold shift + click to select a range
7ff26e7
Templates and TemplateRegistry
dmitriyrepin 5ae7258
Merge remote-tracking branch 'upstream/v1' into v1
dmitriyrepin e8b95dc
Fix pre-commit issues
dmitriyrepin be772c7
Rever dev container changes
dmitriyrepin e43e4da
PR Review: address issues
dmitriyrepin 80dd234
PR Review: register default templates at registry initialization
dmitriyrepin 7caca55
Merge remote-tracking branch 'upstream/v1' into v1
dmitriyrepin 2b2acf2
Dockerfile.dev
dmitriyrepin 73d827a
segy_to_mdio_v1
dmitriyrepin befeca0
Clean up
dmitriyrepin ec057e6
Prototype review notes
59d12e9
Add dev comment
2f3eabf
Add notes that will be deleted later
dmitriyrepin 0477bed
segy_to_mdio_v1 pass 1
dmitriyrepin 5f7b135
indexing_v1 and blocked_io_v1
dmitriyrepin d6a4a35
Remove DEV notes
dmitriyrepin 0b4c29f
Clean up
dmitriyrepin 340f78a
Document bug location
dmitriyrepin 2c34d34
Work around IndexError
dmitriyrepin 93c4b30
Clean temporary code
dmitriyrepin 324879a
More clean up
dmitriyrepin 5debcf9
Remove *_1 infrastructure files
dmitriyrepin 7b96b29
Restore accidently removed dask.array
dmitriyrepin 3115299
Created an issue reproducer
dmitriyrepin 4b1ae8f
Make the required template properties public
dmitriyrepin 21e9e04
Simplify type converter
dmitriyrepin c5f9a63
Improve templates
dmitriyrepin b73cc68
Move test_type_converter.py
dmitriyrepin 55315aa
Move test_type_converter.py
dmitriyrepin 3e64fba
Revert to use the original grid
dmitriyrepin 1f4687f
Integrate segy_to_mdio_v1_customized, fix indexing
dmitriyrepin e39d8c6
Add dimension coordinates in tem,plates
dmitriyrepin 972a05d
Write statistics to Zarr
dmitriyrepin 84ceb57
Delete factory_v1.py
dmitriyrepin 4ff62bc
Complete integrationtest. Fix coordinates
dmitriyrepin 543e886
Fir pre-commit errors
dmitriyrepin 8017d98
PR review: fix trace_worker docstring
dmitriyrepin 90754d3
Review: address some of the issue
dmitriyrepin f0a1c28
Fix bug
dmitriyrepin 5d07ea4
dding todo for sum squares calculation
tasansal b503069
Refactor ChunkIterator
dmitriyrepin 15febc9
Merge branch 'segy_to_mdio_v1'
dmitriyrepin 5980ec9
Refactor ChunkIterator into ChunkIteratorV1
dmitriyrepin 8e5f7a0
Remove segy_to_mdio_v1_customized, dataset_serializer.to_zarr
dmitriyrepin f0f42f3
Add support for trace headers without using _FillValue
dmitriyrepin a441db8
Use StorageLocation in trace_worker_v1
dmitriyrepin d574a47
Fix statistics attribute name
dmitriyrepin cf90b7e
PR review changes
dmitriyrepin a5ae874
PR Improvements: do a single write
dmitriyrepin ab08ef4
TODO: chunked write for non-dimensional coordinates and trace_mask
dmitriyrepin b970d74
Update StorageLocation to use fsspec
dmitriyrepin 2f37c19
Reformat with pre-commit
dmitriyrepin 4f30d95
Use domain name in get_grid_plan
dmitriyrepin 71dcd0d
Fix non-dim coords and chunk_samples=False
dmitriyrepin 1771491
Convert test_3d_import_v1 to V1
dmitriyrepin 1f820a4
Merge-in latest 'upstream v1'
dmitriyrepin b52f534
Fix test_meta_dataset_read
dmitriyrepin 7c6a38f
Merge branch 'v1' into segy_to_mdio_v1
tasansal ba3307f
remove whitespace
tasansal 5e8a1c5
clean up comments
tasansal d03e460
update deps in lockfile
tasansal c8f7cff
simplify dim and non-dim coordinate handling after scan
tasansal 047ea45
remove compatibility tests
tasansal 08c1e70
add filtering capability to header worker
tasansal 81af582
accept subset filter to pass to workers
tasansal f2d59a9
make v1 grid planner awesome
tasansal 18726ed
double to single underscores in test names
tasansal 75a0915
fix broken test harnesses due to incorrect Sequence import
tasansal 174c8fd
clean up dev comment
tasansal 63737a6
clean up whitespace
tasansal c55c080
use new module name
tasansal 406a6b3
clean up segy_to_mdio_v1
tasansal 73073e7
fix whitespace and remove unnecessary list call
tasansal 29bbb70
these are defined as float64 in template
tasansal b13a57c
fix missing dimension coordinate for vertical axis
tasansal 4d1dc8f
fix incorrect dtype comparison for time variable
tasansal 0d410d3
simplify and fix critical bugs
tasansal e7ceced
rename v1 out of things and get rid of old code
tasansal fafe8ab
remove fixed todo
tasansal 2b98486
remove more v1 from names
tasansal 0517a57
rename chunk iterator
tasansal eb8bac7
fix dimensionality in tests due to new (missing) vertical dimension c…
tasansal 22c4613
add todo for numpy ingestion
tasansal 19812e9
fix references to non-v1 naming
tasansal 00ef757
extract grid operations to its own function
tasansal 528acb1
fix typo
tasansal d7b9013
add todo for simplifying storage location
tasansal 792286c
Remove no_fill_var_names, add domain var to Seismic3DPreStackShotTemp…
dmitriyrepin c31bc45
Part 2 of the previous commit
dmitriyrepin bdde865
pre-commit formatting
dmitriyrepin e1405ec
remove dev mount
tasansal File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
dmitriyrepin marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| """A module for converting numpy dtypes to MDIO scalar and structured types.""" | ||
|
|
||
| from numpy import dtype as np_dtype | ||
|
|
||
| from mdio.schemas.dtype import ScalarType | ||
| from mdio.schemas.dtype import StructuredField | ||
| from mdio.schemas.dtype import StructuredType | ||
|
|
||
|
|
||
| def to_scalar_type(data_type: np_dtype) -> ScalarType: | ||
| """Convert numpy dtype to MDIO ScalarType. | ||
|
|
||
| Out of the 24 built-in numpy scalar type objects | ||
| (see https://numpy.org/doc/stable/reference/arrays.dtypes.html) | ||
| this function supports only a limited subset: | ||
| ScalarType.INT8 <-> int8 | ||
| ScalarType.INT16 <-> int16 | ||
| ScalarType.INT32 <-> int32 | ||
| ScalarType.INT64 <-> int64 | ||
| ScalarType.UINT8 <-> uint8 | ||
| ScalarType.UINT16 <-> uint16 | ||
| ScalarType.UINT32 <-> uint32 | ||
| ScalarType.UINT64 <-> uint64 | ||
| ScalarType.FLOAT32 <-> float32 | ||
| ScalarType.FLOAT64 <-> float64 | ||
| ScalarType.COMPLEX64 <-> complex64 | ||
| ScalarType.COMPLEX128 <-> complex128 | ||
| ScalarType.BOOL <-> bool | ||
|
|
||
| Args: | ||
| data_type: numpy dtype to convert | ||
|
|
||
| Returns: | ||
| ScalarType: corresponding MDIO scalar type | ||
|
|
||
| Raises: | ||
| ValueError: if dtype is not supported | ||
| """ | ||
| try: | ||
| return ScalarType(data_type.name) | ||
| except ValueError as exc: | ||
| err = f"Unsupported numpy dtype '{data_type.name}' for conversion to ScalarType." | ||
| raise ValueError(err) from exc | ||
|
|
||
|
|
||
| def to_structured_type(data_type: np_dtype) -> StructuredType: | ||
| """Convert numpy dtype to MDIO StructuredType. | ||
|
|
||
| This function supports only a limited subset of structured types. | ||
| In particular: | ||
| It does not support nested structured types. | ||
| It supports fields of only 13 out of 24 built-in numpy scalar types. | ||
| (see `to_scalar_type` for details). | ||
|
|
||
| Args: | ||
| data_type: numpy dtype to convert | ||
|
|
||
| Returns: | ||
| StructuredType: corresponding MDIO structured type | ||
|
|
||
| Raises: | ||
| ValueError: if dtype is not structured or has no fields | ||
|
|
||
| """ | ||
| if data_type is None or len(data_type.names or []) == 0: | ||
| err = "None or empty dtype provided, cannot convert to StructuredType." | ||
| raise ValueError(err) | ||
|
|
||
| fields = [] | ||
| for field_name in data_type.names: | ||
| field_dtype = data_type.fields[field_name][0] | ||
| scalar_type = to_scalar_type(field_dtype) | ||
| structured_field = StructuredField(name=field_name, format=scalar_type) | ||
| fields.append(structured_field) | ||
| return StructuredType(fields=fields) | ||
|
|
||
|
|
||
| def to_numpy_dtype(data_type: ScalarType | StructuredType) -> np_dtype: | ||
| """Get the numpy dtype for a variable.""" | ||
| if isinstance(data_type, ScalarType): | ||
| return np_dtype(data_type.value) | ||
| if isinstance(data_type, StructuredType): | ||
| return np_dtype([(f.name, f.format.value) for f in data_type.fields]) | ||
| msg = f"Expected ScalarType or StructuredType, got '{type(data_type).__name__}'" | ||
| raise ValueError(msg) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| """StorageLocation class for managing local and cloud storage locations.""" | ||
|
|
||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
| import fsspec | ||
|
|
||
|
|
||
| # TODO(Dmitriy Repin): Reuse fsspec functions for some methods we implemented here | ||
| # https://github.com/TGSAI/mdio-python/issues/597 | ||
| class StorageLocation: | ||
tasansal marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| """A class to represent a local or cloud storage location for SEG-Y or MDIO files. | ||
|
|
||
| This class abstracts the storage location, allowing for both local file paths and | ||
| cloud storage URIs (e.g., S3, GCS). It uses fsspec to check existence and manage options. | ||
| Note, we do not want to make it a dataclass because we want the uri and the options to | ||
| be read-only immutable properties. | ||
|
|
||
| uri: The URI of the storage location (e.g., '/path/to/file', 'file:///path/to/file', | ||
| 's3://bucket/path', 'gs://bucket/path'). | ||
| options: Optional dictionary of options for the cloud, such as credentials. | ||
|
|
||
| """ | ||
|
|
||
| def __init__(self, uri: str = "", options: dict[str, Any] = None): | ||
| self._uri = uri | ||
| self._options = options or {} | ||
| self._fs = None | ||
|
|
||
| if uri.startswith(("s3://", "gs://")): | ||
| return | ||
|
|
||
| if uri.startswith("file://"): | ||
| self._uri = self._uri.removeprefix("file://") | ||
| # For local paths, ensure they are absolute and resolved | ||
| self._uri = str(Path(self._uri).resolve()) | ||
| return | ||
|
|
||
| @property | ||
| def uri(self) -> str: | ||
| """Get the URI (read-only).""" | ||
| return self._uri | ||
|
|
||
| @property | ||
| def options(self) -> dict[str, Any]: | ||
| """Get the options (read-only).""" | ||
| # Return a copy to prevent external modification | ||
| return self._options.copy() | ||
|
|
||
| @property | ||
| def _filesystem(self) -> fsspec.AbstractFileSystem: | ||
| """Get the fsspec filesystem instance for this storage location.""" | ||
| if self._fs is None: | ||
| self._fs = fsspec.filesystem(self._protocol, **self._options) | ||
| return self._fs | ||
|
|
||
| @property | ||
| def _path(self) -> str: | ||
| """Extract the path portion from the URI.""" | ||
| if "://" in self._uri: | ||
| return self._uri.split("://", 1)[1] | ||
| return self._uri # For local paths without file:// prefix | ||
|
|
||
| @property | ||
| def _protocol(self) -> str: | ||
| """Extract the protocol/scheme from the URI.""" | ||
| if "://" in self._uri: | ||
| return self._uri.split("://", 1)[0] | ||
| return "file" # Default to file protocol | ||
|
|
||
| def exists(self) -> bool: | ||
| """Check if the storage location exists using fsspec.""" | ||
| try: | ||
| return self._filesystem.exists(self._path) | ||
| except Exception as e: | ||
| # Log the error and return False for safety | ||
| # In a production environment, you might want to use proper logging | ||
| print(f"Error checking existence of {self._uri}: {e}") | ||
| return False | ||
|
|
||
| def __str__(self) -> str: | ||
| """String representation of the storage location.""" | ||
| return self._uri | ||
|
|
||
| def __repr__(self) -> str: | ||
| """Developer representation of the storage location.""" | ||
| return f"StorageLocation(uri='{self._uri}', options={self._options})" | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.