Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
6b02edd
initial commit for geocode SLC unit test
LiangJYu Aug 8, 2022
fd880ed
download test data from zenodo
LiangJYu Aug 15, 2022
9f0b607
added docstrings, more descriptive variable names, internet check
LiangJYu Sep 12, 2022
97c4461
Merge branch 'main' into geocode_slc_unit_test
LiangJYu Oct 20, 2022
2984d49
run integration test in circleci
LiangJYu Oct 20, 2022
1abbfe4
adding pytest as requirement
LiangJYu Oct 24, 2022
92daa0f
adding pytest to docker specfile
LiangJYu Oct 28, 2022
0164652
another attempt to run pytest in ci
LiangJYu Oct 28, 2022
ebfea34
yet another attempt to run pytest in ci
LiangJYu Oct 28, 2022
13cb5b2
applying suggestion from rtburns
LiangJYu Nov 2, 2022
7a5655c
docker command tweak
LiangJYu Nov 2, 2022
6442402
fix entry point
LiangJYu Nov 3, 2022
cd70f7c
remove entry point
LiangJYu Nov 4, 2022
6e3069c
fix path
LiangJYu Nov 8, 2022
8eff671
activate environment
LiangJYu Nov 8, 2022
175ab31
init bash
LiangJYu Nov 8, 2022
c65ace4
bash path not needed
LiangJYu Nov 9, 2022
17a70ff
docker command fix
LiangJYu Nov 9, 2022
dbf457b
adding attr to specifile
LiangJYu Nov 16, 2022
7e34b3f
fix typo in url
LiangJYu Nov 16, 2022
35cb9e8
Merge remote-tracking branch 'upstream/main' into geocode_slc_unit_test
LiangJYu Dec 3, 2022
5b4dc91
updated specfile
LiangJYu Dec 5, 2022
07c83e4
account for multiburst, new burst ID, geocode entire burst
LiangJYu Dec 5, 2022
b30f823
use main instead of tag because of missing features
LiangJYu Dec 6, 2022
18c9cb4
clone hhtps not ssh
LiangJYu Dec 6, 2022
4ddf5f8
template yaml
LiangJYu Dec 6, 2022
aa1540c
Merge branch 'main' into geocode_slc_unit_test
LiangJYu Feb 17, 2023
8e10100
update unit tests for smaller dataset
LiangJYu Feb 21, 2023
88f0487
parallel test data download
LiangJYu Feb 23, 2023
d438b58
update specifile
LiangJYu Feb 23, 2023
dd2e841
remove unused imports
LiangJYu Feb 23, 2023
e693f51
more comments/docstrings
LiangJYu Feb 23, 2023
aeb64ac
cleaner mp pool
LiangJYu Feb 25, 2023
3f72ad3
fix docstring
LiangJYu Feb 25, 2023
b95f45c
load just the slice not EVERYTHING
LiangJYu Feb 25, 2023
8953342
Merge branch 'geocode_slc_unit_test' of github.com:LiangJYu/COMPASS i…
LiangJYu Feb 25, 2023
dd7b5e1
clarity in file related variable
LiangJYu Mar 2, 2023
fa33f64
fix typo in path
LiangJYu Mar 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
more comments/docstrings
add environment yaml to help create clean/minimal environment
add bounds check to margin application
  • Loading branch information
LiangJYu committed Feb 23, 2023
commit e693f5129acf3c675a968b817aa422ff03f8ed9c
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -151,4 +151,12 @@ cython_debug/
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/


# unit test files/directories
tests/data/geo_cslc_s1.yaml
tests/data/2022-10-16_0000_Rosamond-corner-reflectors.csv
tests/data/S1A_IW_SLC__1SDV_20221016T015043_20221016T015111_045461_056FC0_6681.zip
tests/data/orbits
tests/data/test_dem.tiff
tests/data/test_burst_map.sqlite3
tests/product
tests/scratch
Comment on lines +161 to +162
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are getting created in the test directory and we want to ignore them, what if we use a temporary pytest directory that'll get removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. in conftest.py, using https://docs.pytest.org/en/7.1.x/how-to/tmp_path.html#the-tmp-path-factory-fixture

@pytest.fixture(scope="session")
def geocode_slc_params(tmp_path_factory):

...
    # get test working directory
    test_path = pathlib.Path(__file__).parent.resolve()
    output_path = tmp_path_factory.mktemp("output")

...
      cfg = f_template.read().replace( '@TEST_PATH@', str(output_path)).\

But now that i've written this out, I'm thinking you may have stored it in the tests/ directory so that you could inspect the output, so feel free to ignore this suggestion if that's the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your reasoning for the scratch directory is spot on.

tempfile.TemporaryDirectory could be used just for unit tests but would disable ability to check intermediate results.

19 changes: 19 additions & 0 deletions compass_env.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: compass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make a separate conda environment file?

I did this in dolphin here https://github.com/opera-adt/dolphin/blob/main/conda-env.yml but my reasoning was that I had a python version in there, and that fails if you do pip install -r requirements.txt.

We have a requirements files that also has python, so right now you could only use that with conda.

Also, if we were to add this conda env for easier installation, should we add isce3 to it? Seems like we don't want to tell every person they should install it from source like the comment in requirements.txt says.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because I forgot to add -c conda-forge to my conda create command and figured that out just now. 🤣 Having two files is redudant and I think the yaml should replace the requirements.txt because the contents of the yaml is more informative and avoids the mistake I made.

I agree with you on isce3 item. We could have two environment files with a lot of identical content; one for devs (no isce3) and one for just running COMPASS (with isce3). Or should there by just one environment file with separate info for ISCE3+COMPASS devs on how to overwrite isce3 when they build/install.

@scottstanie What do you think?

Copy link
Contributor

@scottstanie scottstanie Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still figuring out if there's any benefit to having the pip install option (where it's usual to have a requirements.txt file that you can link to in your package metadata file) in addition to the conda env file...

But I agree that just one file is probably be better.
Also,

  • yes I think you should have isce3 on there
  • for developers, which we assume are more sophisticated than average if they're trying to get dev isce3 working, they can uninstall the conda one/ go through the build with separate instructions. no need for a separate environment.yaml file with one fewer thing in it.

as a nitpick, I might go for environment.yaml as the file name or else conda-env.yaml to be even more explicit it's for conda. That lets someone know from looking at the file it's associated with conda, since the filename compass_env.yaml could be anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the environment yaml file as it can only be used to create a new environment file. While being slightly more verbose to use, requirements.txt can be used to create a new environment and install packages to an existing one.

channels:
- conda-forge
- nodefaults
dependencies:
# isce3 can be added if forge version is sufficient
- gdal>=3.0
- h5py
- lxml
- numpy
- pandas
- pyproj
- pytest
- python>=3.9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we use any 3.9+ specific features that would make 3.7/3.8 break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far no 🤞

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a dependency on lxml? I thought the s1-reader had this dependency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually found that s1reader has just 2-3 cases where it type hinted using the list[thing] syntax introduced in 3.9.... personally I think we could just replace those with from typing import List; List[...] so we dont lose 3.8 compatibility unnecessarily, but that's a s1reader PR.

- requests
- ruamel.yaml
- scipy
- shapely
- yamale
5 changes: 1 addition & 4 deletions src/compass/utils/runconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,16 +206,14 @@ def runconfig_to_bursts(cfg: SimpleNamespace) -> list[Sentinel1BurstSlc]:
for pol, i_subswath in pol_subswath_index_pairs:

# loop over burst objs extracted from SAFE zip
loaded_bursts = load_bursts(safe_file, orbit_path, i_subswath, pol)
for burst in loaded_bursts:
for burst in load_bursts(safe_file, orbit_path, i_subswath, pol):
# get burst ID
burst_id = str(burst.burst_id)

# include ALL bursts if no burst IDs given
# is burst_id wanted? skip if not given in config
if (cfg.input_file_group.burst_id is not None and
burst_id not in cfg.input_file_group.burst_id):
print(burst_id, cfg.input_file_group.burst_id, burst_id in cfg.input_file_group.burst_id)
continue

# get polarization and save as tuple with burst ID
Expand All @@ -239,7 +237,6 @@ def runconfig_to_bursts(cfg: SimpleNamespace) -> list[Sentinel1BurstSlc]:
# if reference burst, ok to add if id+pol combo does not exist
# no duplicate id+pol combos for reference bursts
if not_ref or not burst_id_pol_exist:
print(f'{burst_id} found')
burst_ids_found.append(burst_id)
bursts.append(burst)

Expand Down
58 changes: 43 additions & 15 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,25 +9,48 @@

from compass.utils import iono


def download_if_needed(local_path):
# check if test inputs and reference files exists; download if not found.
'''
Check if given path to file exists. Download if it from zenodo does not.

Parameters
----------
local_path: str
Path to file
'''
# return if file is found
if os.path.isfile(local_path):
return

check_internet_connection()

dataset_url = 'https://zenodo.org/record/7668411/files/'
dst_dir, file_name = os.path.split(local_path)

# create destination directory if it does not exist
if dst_dir:
os.makedirs(dst_dir, exist_ok=True)

# download data
dataset_url = 'https://zenodo.org/record/7668411/files/'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should have a file inside the repository having all hard-coded paths. In this way, we do not need to hunt them when they change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So combine this with this? Perhaps into a yaml?

dataset_url: https://zenodo.org/record/7668411/files/
test_files:
    orbit_path: orbits/S1A_OPER_AUX_POEORB_OPOD_20221105T083813_V20221015T225942_20221017T005942.EOF'
    safe_path: S1A_IW_SLC__1SDV_20221016T015043_20221016T015111_045461_056FC0_6681.zip
    ...

target_url = f'{dataset_url}/{file_name}'
with open(local_path, 'wb') as f:
f.write(requests.get(target_url).content)


@pytest.fixture(scope="session")
def unit_test_paths():
test_paths = types.SimpleNamespace()
def geocode_slc_params():
'''
Parameters to be used by geocode SLC unit test

Returns
-------
test_params: SimpleNamespace
SimpleNamespace containing geocode SLC unit test parameters
'''
test_params = types.SimpleNamespace()

# burst ID and date of burst
burst_id = 't064_135523_iw2'
b_date = '20221016'

Expand All @@ -39,34 +62,40 @@ def unit_test_paths():

# paths for template and actual runconfig
gslc_template_path = f'{test_data_path}/geo_cslc_s1_template.yaml'
test_paths.gslc_cfg_path = f'{test_data_path}/geo_cslc_s1.yaml'
test_params.gslc_cfg_path = f'{test_data_path}/geo_cslc_s1.yaml'

# read runconfig template, replace pieces, write to runconfig
with open(gslc_template_path, 'r') as f_template, \
open(test_paths.gslc_cfg_path, 'w') as f_cfg:
open(test_params.gslc_cfg_path, 'w') as f_cfg:
cfg = f_template.read().replace('@TEST_PATH@', str(test_path)).\
replace('@DATA_PATH@', test_data_path).\
replace('@BURST_ID@', burst_id)
f_cfg.write(cfg)

# check for files and download as needed
# files needed for geocode SLC unit test
test_files = ['S1A_IW_SLC__1SDV_20221016T015043_20221016T015111_045461_056FC0_6681.zip',
'orbits/S1A_OPER_AUX_POEORB_OPOD_20221105T083813_V20221015T225942_20221017T005942.EOF',
'test_dem.tiff', 'test_burst_map.sqlite3',
'2022-10-16_0000_Rosamond-corner-reflectors.csv']
test_files = [f'{test_data_path}/{test_file}' for test_file in test_files]

# parallel download of test files
# parallel download of test files (if necessary)
pool = mp.Pool(len(test_files))
_ = pool.map(download_if_needed, test_files)
pool.close()
pool.join()

test_paths.corner_coord_csv_path = test_files[-1]
test_paths.output_hdf5 = f'{test_path}/product/{burst_id}/{b_date}/{burst_id}_{b_date}.h5'
test_paths.grid_group_path = '/science/SENTINEL1/CSLC/grids'
# path to file containing corner reflectors
test_params.corner_coord_csv_path = test_files[-1]

# path the output HDF5
test_params.output_hdf5 = f'{test_path}/product/{burst_id}/{b_date}/{burst_id}_{b_date}.h5'

return test_paths
# path to groups and datasets in output HDF5
test_params.grid_group_path = '/science/SENTINEL1/CSLC/grids'
test_params.raster_path = f'{test_params.grid_group_path}/VV'

return test_params

@pytest.fixture(scope='session')
def ionex_params(download_data=True):
Expand All @@ -81,9 +110,8 @@ def ionex_params(download_data=True):

Returns
-------
tec_file: str
Path to local or downloaded TEC file to
use in the unit test
tec_file: SimpleNamespace
SimpleNamespace containing parameters needed for ionex unit test
'''
test_params = types.SimpleNamespace()

Expand Down
89 changes: 67 additions & 22 deletions tests/test_s1_geocode_slc.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,27 +8,55 @@
from compass.utils.geo_runconfig import GeoRunConfig


def test_geocode_slc_run(unit_test_paths):
def test_geocode_slc_run(geocode_slc_params):
'''
run s1_geocode_slc to ensure it does not crash
Run s1_geocode_slc to ensure it does not crash

Parameters
----------
geocode_slc_params: SimpleNamespace
SimpleNamespace containing geocode SLC unit test parameters
'''
# load yaml to cfg
cfg = GeoRunConfig.load_from_yaml(unit_test_paths.gslc_cfg_path,
cfg = GeoRunConfig.load_from_yaml(geocode_slc_params.gslc_cfg_path,
workflow_name='s1_cslc_geo')

# pass cfg to s1_geocode_slc
s1_geocode_slc.run(cfg)

def get_nearest_index(arr, val):

def _get_nearest_index(arr, val):
'''
Find index of element in given array closest to given value

Parameters
----------
arr: np.ndarray
1D array to be searched
val: float
Number to be searched for

Returns
-------
_: int
Index of element in arr where val is closest
'''
return np.abs(arr - val).argmin()

def get_reflectors_extents_slice(unit_test_paths, margin=50):

def _get_reflectors_bounding_slice(geocode_slc_params):
'''
get max and min lat, lon
Get latitude, longitude slice that contains all the corner reflectors in
CSV list of corner reflectors

Parameters
----------
geocode_slc_params: SimpleNamespace
SimpleNamespace containing geocode SLC unit test parameters
'''
# extract from HDF5
with h5py.File(unit_test_paths.output_hdf5, 'r') as h5_obj:
grid_group = h5_obj[unit_test_paths.grid_group_path]
with h5py.File(geocode_slc_params.output_hdf5, 'r') as h5_obj:
grid_group = h5_obj[geocode_slc_params.grid_group_path]

# create projection to covert from UTM to LLH
epsg = int(grid_group['projection'][()])
Expand All @@ -43,33 +71,50 @@ def get_reflectors_extents_slice(unit_test_paths, margin=50):
lats = np.array([np.degrees(proj.inverse([x_coords_utm[0], y, 0])[1])
for y in y_coords_utm])

# get array shape for later check of slice with margins applied
height, width = h5_obj[geocode_slc_params.raster_path].shape

# extract all lat/lon corner reflector coordinates
corner_lats = []
corner_lons = []
with open(unit_test_paths.corner_coord_csv_path, 'r') as csvfile:
with open(geocode_slc_params.corner_coord_csv_path, 'r') as csvfile:
corner_reader = csv.DictReader(csvfile)
for row in corner_reader:
corner_lats.append(float(row['Latitude (deg)']))
corner_lons.append(float(row['Longitude (deg)']))

i_max_lat = get_nearest_index(lats, np.max(corner_lats))
i_min_lat = get_nearest_index(lats, np.min(corner_lats))
i_max_lon = get_nearest_index(lons, np.max(corner_lons))
i_min_lon = get_nearest_index(lons, np.min(corner_lons))
# find nearest index for min/max of lats/lons and apply margin
# apply margin to bounding box and ensure raster bounds are not exceeded
# application of margin y indices reversed due descending order lats vector
margin = 50
i_max_y = max(_get_nearest_index(lats, np.max(corner_lats)) - margin, 0)
i_min_y = min(_get_nearest_index(lats, np.min(corner_lats)) + margin,
height - 1)
i_max_x = min(_get_nearest_index(lons, np.max(corner_lons)) + margin,
width - 1)
i_min_x = max(_get_nearest_index(lons, np.min(corner_lons)) - margin, 0)

return np.s_[i_max_lat - margin:i_min_lat + margin,
i_min_lon - margin:i_max_lon + margin]
# return as slice
# y indices reversed to account for descending order lats vector
return np.s_[i_max_y:i_min_y, i_min_x:i_max_x]

def test_geocode_slc_validate(unit_test_paths):

def test_geocode_slc_validate(geocode_slc_params):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test implicitly rely on test_geocode_slc_run having already run?

If so, maybe we should make test_geocode_slc_run into a fixture that returns something (like the output path maybe), and have test_geocode_slc_validate use that fixture.
Otherwise this would fail it we decide to add in a testing speed up like https://github.com/pytest-dev/pytest-xdist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default it looks like tests are run in the order they are found (sorry I wasn't able to find anything definitive in the pytest docs). Anecdotally, swapping the order to order of the tests in code supports this.

Test order can be set if it becomes an issue.

Making test_geocode_slc_run a fixture would guarantee order but I think would remove test_geocode_slc_run as a test.

pytest-xdist issues would be tricky. I say we address those issues when we encounter them. I speculate that the runtime of tests calling isce3.geocode.geocode_slc will significantly be longer then any of the follow up (likely single-threaded) validation tests. The multi-threaded nature isce3.geocode.geocode_slc precludes them from being distributed like the single-threaded validation tests and that would diminish the returns of from parallelization.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have 2 separate tests. 1) testing that the actual workflow runs 2) validate the geolocation after running. Can we separate those two tests into separate functions?

'''
check for reflectors in geocoded output
Check for reflectors in geocoded output

Parameters
----------
geocode_slc_params: SimpleNamespace
SimpleNamespace containing geocode SLC unit test parameters
'''
s_ = get_reflectors_extents_slice(unit_test_paths)
# get slice where corner reflectors should be
s_ = _get_reflectors_bounding_slice(geocode_slc_params)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation test I had in mind was a bit different and more in line with the validation activities that @seongsujeong is doing. We have geocoded a SLC, we do have the CR location from file, we can check that our geolocation is not messed up.

To do so:

  1. based on the CR file, we can identify the CR lat/lons and extract the CR peak from the geocoded SLC
  2. We can oversample the peak and get the CR location
  3. We can check that the difference between the processed CR location and the one we get from file in X- and Y-direction and see whether we meet our requirement

Was this the plan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you described was the plan but I wasn't able to find all them. This is just a quick check to see any CRs were found so the unit tests would basic validation.


with h5py.File(unit_test_paths.output_hdf5, 'r') as h5_obj:
src_path = f'{unit_test_paths.grid_group_path}/VV'
arr = h5_obj[src_path][()][s_]
print(arr.shape, s_)
# slice raster array
with h5py.File(geocode_slc_params.output_hdf5, 'r') as h5_obj:
arr = h5_obj[geocode_slc_params.raster_path][()][s_]

# check for bright spots in sliced array
corner_reflector_threshold = 3e3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How have you determined this threshold? Also, I do not fully understand the test. We are just testing for the presence of bright spots inside the processed CSLC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - this test only check for presence of bright spots in the processed CSLC within the area of the CRs. I was thinking to improve this to checking for individual CRs when we get all our corrections implemented.

I determined the threshold by plotting where the CRs are and picked a seemingly appropriate value based on the colorbar.

assert np.any(arr > corner_reflector_threshold)