Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
7ff26e7
Templates and TemplateRegistry
dmitriyrepin Jul 17, 2025
5ae7258
Merge remote-tracking branch 'upstream/v1' into v1
dmitriyrepin Jul 17, 2025
e8b95dc
Fix pre-commit issues
dmitriyrepin Jul 17, 2025
be772c7
Rever dev container changes
dmitriyrepin Jul 18, 2025
e43e4da
PR Review: address issues
dmitriyrepin Jul 18, 2025
80dd234
PR Review: register default templates at registry initialization
dmitriyrepin Jul 18, 2025
7caca55
Merge remote-tracking branch 'upstream/v1' into v1
dmitriyrepin Jul 22, 2025
2b2acf2
Dockerfile.dev
dmitriyrepin Jul 22, 2025
73d827a
segy_to_mdio_v1
dmitriyrepin Jul 22, 2025
befeca0
Clean up
dmitriyrepin Jul 23, 2025
ec057e6
Prototype review notes
Jul 23, 2025
59d12e9
Add dev comment
Jul 23, 2025
2f3eabf
Add notes that will be deleted later
dmitriyrepin Jul 23, 2025
0477bed
segy_to_mdio_v1 pass 1
dmitriyrepin Jul 26, 2025
5f7b135
indexing_v1 and blocked_io_v1
dmitriyrepin Jul 26, 2025
d6a4a35
Remove DEV notes
dmitriyrepin Jul 26, 2025
0b4c29f
Clean up
dmitriyrepin Jul 26, 2025
340f78a
Document bug location
dmitriyrepin Jul 26, 2025
2c34d34
Work around IndexError
dmitriyrepin Jul 27, 2025
93c4b30
Clean temporary code
dmitriyrepin Jul 27, 2025
324879a
More clean up
dmitriyrepin Jul 27, 2025
5debcf9
Remove *_1 infrastructure files
dmitriyrepin Jul 28, 2025
7b96b29
Restore accidently removed dask.array
dmitriyrepin Jul 29, 2025
3115299
Created an issue reproducer
dmitriyrepin Jul 29, 2025
4b1ae8f
Make the required template properties public
dmitriyrepin Jul 29, 2025
21e9e04
Simplify type converter
dmitriyrepin Jul 30, 2025
c5f9a63
Improve templates
dmitriyrepin Jul 30, 2025
b73cc68
Move test_type_converter.py
dmitriyrepin Jul 30, 2025
55315aa
Move test_type_converter.py
dmitriyrepin Jul 30, 2025
3e64fba
Revert to use the original grid
dmitriyrepin Jul 30, 2025
1f4687f
Integrate segy_to_mdio_v1_customized, fix indexing
dmitriyrepin Jul 30, 2025
e39d8c6
Add dimension coordinates in tem,plates
dmitriyrepin Jul 30, 2025
972a05d
Write statistics to Zarr
dmitriyrepin Jul 30, 2025
84ceb57
Delete factory_v1.py
dmitriyrepin Jul 30, 2025
4ff62bc
Complete integrationtest. Fix coordinates
dmitriyrepin Jul 31, 2025
543e886
Fir pre-commit errors
dmitriyrepin Jul 31, 2025
8017d98
PR review: fix trace_worker docstring
dmitriyrepin Aug 1, 2025
90754d3
Review: address some of the issue
dmitriyrepin Aug 1, 2025
f0a1c28
Fix bug
dmitriyrepin Aug 1, 2025
5d07ea4
dding todo for sum squares calculation
tasansal Aug 1, 2025
b503069
Refactor ChunkIterator
dmitriyrepin Aug 1, 2025
15febc9
Merge branch 'segy_to_mdio_v1'
dmitriyrepin Aug 2, 2025
5980ec9
Refactor ChunkIterator into ChunkIteratorV1
dmitriyrepin Aug 2, 2025
8e5f7a0
Remove segy_to_mdio_v1_customized, dataset_serializer.to_zarr
dmitriyrepin Aug 2, 2025
f0f42f3
Add support for trace headers without using _FillValue
dmitriyrepin Aug 4, 2025
a441db8
Use StorageLocation in trace_worker_v1
dmitriyrepin Aug 4, 2025
d574a47
Fix statistics attribute name
dmitriyrepin Aug 4, 2025
cf90b7e
PR review changes
dmitriyrepin Aug 4, 2025
a5ae874
PR Improvements: do a single write
dmitriyrepin Aug 4, 2025
ab08ef4
TODO: chunked write for non-dimensional coordinates and trace_mask
dmitriyrepin Aug 4, 2025
b970d74
Update StorageLocation to use fsspec
dmitriyrepin Aug 4, 2025
2f37c19
Reformat with pre-commit
dmitriyrepin Aug 4, 2025
4f30d95
Use domain name in get_grid_plan
dmitriyrepin Aug 4, 2025
71dcd0d
Fix non-dim coords and chunk_samples=False
dmitriyrepin Aug 5, 2025
1771491
Convert test_3d_import_v1 to V1
dmitriyrepin Aug 5, 2025
1f820a4
Merge-in latest 'upstream v1'
dmitriyrepin Aug 6, 2025
b52f534
Fix test_meta_dataset_read
dmitriyrepin Aug 6, 2025
7c6a38f
Merge branch 'v1' into segy_to_mdio_v1
tasansal Aug 7, 2025
ba3307f
remove whitespace
tasansal Aug 7, 2025
5e8a1c5
clean up comments
tasansal Aug 7, 2025
d03e460
update deps in lockfile
tasansal Aug 7, 2025
c8f7cff
simplify dim and non-dim coordinate handling after scan
tasansal Aug 7, 2025
047ea45
remove compatibility tests
tasansal Aug 8, 2025
08c1e70
add filtering capability to header worker
tasansal Aug 8, 2025
81af582
accept subset filter to pass to workers
tasansal Aug 8, 2025
f2d59a9
make v1 grid planner awesome
tasansal Aug 8, 2025
18726ed
double to single underscores in test names
tasansal Aug 8, 2025
75a0915
fix broken test harnesses due to incorrect Sequence import
tasansal Aug 8, 2025
174c8fd
clean up dev comment
tasansal Aug 8, 2025
63737a6
clean up whitespace
tasansal Aug 8, 2025
c55c080
use new module name
tasansal Aug 8, 2025
406a6b3
clean up segy_to_mdio_v1
tasansal Aug 8, 2025
73073e7
fix whitespace and remove unnecessary list call
tasansal Aug 8, 2025
29bbb70
these are defined as float64 in template
tasansal Aug 8, 2025
b13a57c
fix missing dimension coordinate for vertical axis
tasansal Aug 8, 2025
4d1dc8f
fix incorrect dtype comparison for time variable
tasansal Aug 8, 2025
0d410d3
simplify and fix critical bugs
tasansal Aug 8, 2025
e7ceced
rename v1 out of things and get rid of old code
tasansal Aug 8, 2025
fafe8ab
remove fixed todo
tasansal Aug 8, 2025
2b98486
remove more v1 from names
tasansal Aug 8, 2025
0517a57
rename chunk iterator
tasansal Aug 8, 2025
eb8bac7
fix dimensionality in tests due to new (missing) vertical dimension c…
tasansal Aug 8, 2025
22c4613
add todo for numpy ingestion
tasansal Aug 8, 2025
19812e9
fix references to non-v1 naming
tasansal Aug 8, 2025
00ef757
extract grid operations to its own function
tasansal Aug 8, 2025
528acb1
fix typo
tasansal Aug 8, 2025
d7b9013
add todo for simplifying storage location
tasansal Aug 8, 2025
792286c
Remove no_fill_var_names, add domain var to Seismic3DPreStackShotTemp…
dmitriyrepin Aug 8, 2025
c31bc45
Part 2 of the previous commit
dmitriyrepin Aug 8, 2025
bdde865
pre-commit formatting
dmitriyrepin Aug 8, 2025
e1405ec
remove dev mount
tasansal Aug 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge-in latest 'upstream v1'
  • Loading branch information
dmitriyrepin committed Aug 6, 2025
commit 1f820a4b9fafceb2bf9b0fdbfb1348bd50e11583
16 changes: 10 additions & 6 deletions src/mdio/segy/_workers.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,16 @@

import os
from typing import TYPE_CHECKING
from typing import TypedDict
from typing import cast

import numpy as np
from segy import SegyFile

if TYPE_CHECKING:
from segy.arrays import HeaderArray
from segy.config import SegySettings
from segy.schema import SegySpec
from xarray import Dataset as xr_Dataset
from zarr import Array as zarr_Array

Expand All @@ -29,10 +32,7 @@ class SegyFileArguments(TypedDict):
settings: SegySettings | None


def header_scan_worker(
segy_kw: SegyFileArguments,
trace_range: tuple[int, int],
) -> HeaderArray:
def header_scan_worker(segy_kw: SegyFileArguments, trace_range: tuple[int, int]) -> HeaderArray:
"""Header scan worker.

If SegyFile is not open, it can either accept a path string or a handle that was opened in
Expand Down Expand Up @@ -69,7 +69,7 @@ def header_scan_worker(


def trace_worker_v1( # noqa: PLR0913
segy_file: SegyFile,
segy_kw: SegyFileArguments,
output_location: StorageLocation,
data_variable_name: str,
region: dict[str, slice],
Expand All @@ -79,7 +79,7 @@ def trace_worker_v1( # noqa: PLR0913
"""Writes a subset of traces from a region of the dataset of Zarr file.

Args:
segy_file: SegyFile instance.
segy_kw: Arguments to open SegyFile instance.
output_location: StorageLocation for the output Zarr dataset
(e.g. local file path or cloud storage URI) the location
also includes storage options for cloud storage.
Expand All @@ -94,6 +94,10 @@ def trace_worker_v1( # noqa: PLR0913
if not dataset.trace_mask.any():
return None

# Open the SEG-Y file in every new process / spawned worker since the
# open file handles cannot be shared across processes.
segy_file = SegyFile(**segy_kw)

not_null = grid_map != UINT32_MAX

live_trace_indexes = grid_map[not_null].tolist()
Expand Down
10 changes: 9 additions & 1 deletion src/mdio/segy/blocked_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from __future__ import annotations

import multiprocessing as mp
import os
from concurrent.futures import ProcessPoolExecutor
from concurrent.futures import as_completed
Expand Down Expand Up @@ -91,15 +92,22 @@ def to_zarr_v1( # noqa: PLR0913, PLR0915
# (e.g. for unit testing or for debugging with non-parallelized processing)
# def _create_executor(num_chunks: int)-> ProcessPoolExecutor:

# For Unix async writes with s3fs/fsspec & multiprocessing, use 'spawn' instead of default
# 'fork' to avoid deadlocks on cloud stores. Slower but necessary. Default on Windows.
num_cpus = int(os.getenv("MDIO__IMPORT__CPU_COUNT", default_cpus))
num_workers = min(num_chunks, num_cpus)
context = mp.get_context("spawn")
executor = ProcessPoolExecutor(max_workers=num_workers, mp_context=context)
# return executor

segy_kw = {
"url": segy_file.fs.unstrip_protocol(segy_file.url),
"spec": segy_file.spec,
"settings": segy_file.settings,
}
with executor:
futures = []
common_args = (segy_file, output_location, data_variable_name)
common_args = (segy_kw, output_location, data_variable_name)
for region in chunk_iter:
index_slices = tuple(region[key] for key in data.dims[:-1])
subset_args = (
Expand Down
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.