Skip to content

Commit 8c1eef1

Browse files
committed
update crop-classification notebooks so they run successfully and use
cache to speed up run time
1 parent ffc9385 commit 8c1eef1

15 files changed

+2104
-709
lines changed

jupyter-notebooks/crop-classification/classify-cart-l8-ps.ipynb

Lines changed: 480 additions & 158 deletions
Large diffs are not rendered by default.

jupyter-notebooks/crop-classification/classify-cart.ipynb

Lines changed: 301 additions & 366 deletions
Large diffs are not rendered by default.
Lines changed: 380 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,380 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Prepare CDL Data\n",
8+
"\n",
9+
"In this notebook, we prepare the 2016 from the [USDA 2016 Crop Data Layer](https://www.nass.usda.gov/Research_and_Science/Cropland/Release/index.php) (CDL) for Iowa for use in the crop classification notebooks. This involves taking the CDL data for Iowa, projecting, cropping, and resampling it to match the Planet scene. We perform these steps for the crop classification test and train Planet scenes.\n",
10+
"\n",
11+
"The end results will be saved in `pre-data` for use in those notebooks. The initial data is too large to be managed gracefully in git.\n",
12+
"\n",
13+
"## Install Dependencies"
14+
]
15+
},
16+
{
17+
"cell_type": "code",
18+
"execution_count": 63,
19+
"metadata": {},
20+
"outputs": [],
21+
"source": [
22+
"import os\n",
23+
"import pathlib\n",
24+
"from subprocess import check_output, STDOUT, CalledProcessError\n",
25+
"import tempfile\n",
26+
"\n",
27+
"import rasterio"
28+
]
29+
},
30+
{
31+
"cell_type": "markdown",
32+
"metadata": {},
33+
"source": [
34+
"## Obtain CDL Data File\n",
35+
"\n",
36+
"In this section, we download the 2016 Iowa CDL.\n",
37+
"\n",
38+
"2016 CDL for Iowa is obtained through the [CropScape](https://nassgeodata.gmu.edu/CropScape/) site. On that site, ensure the layer is 'CropLand Data Layers -> 2016', then click the icon for 'Define Area of Interest by State...' (looks like an outline of the US with the US flag design). Under 'Select a State', select Iowa and click 'Submit.' Next, click the icon for 'Download Defined Area of Interest Data' (looks like a postal letter with an arrow). In the popup, ensure the 'CDL' tab is open and '2016' is selected, then click 'Select.' Another popup should appear with 'Please Wait...' then after a while, the popup should be replaced by a popup that says 'Download files from server'. Click 'Download' and after a bit the download should begin.\n",
39+
"\n",
40+
"Unzip the downloaded folder and move the tif, `CDL_2016_19.tif` to the `data` folder."
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": 20,
46+
"metadata": {},
47+
"outputs": [
48+
{
49+
"name": "stdout",
50+
"output_type": "stream",
51+
"text": [
52+
"data/CDL_2016_19.tif\n"
53+
]
54+
}
55+
],
56+
"source": [
57+
"# ensure the tif file is present\n",
58+
"cdl_full = os.path.join('data', 'CDL_2016_19.tif')\n",
59+
"print(cdl_full)\n",
60+
"assert os.path.isfile(cdl_full)"
61+
]
62+
},
63+
{
64+
"cell_type": "markdown",
65+
"metadata": {},
66+
"source": [
67+
"## Prepare Train CDL Dataset\n",
68+
"\n",
69+
"The first step in preparing the train CDL dataset is downloading the train scene. We also download the metadata and udm because those are used in another notebook.\n",
70+
"\n",
71+
"The train scene is [210879_1558814_2016-07-25_0e16](https://api.planet.com/data/v1/item-types/PSOrthoTile/items/210879_1558814_2016-07-25_0e16/thumb).\n",
72+
"\n",
73+
"### Download Scene and Supporting Files"
74+
]
75+
},
76+
{
77+
"cell_type": "code",
78+
"execution_count": 21,
79+
"metadata": {},
80+
"outputs": [],
81+
"source": [
82+
"# define and, if necessary, create data directory\n",
83+
"train_folder = os.path.join('data', 'cart', '210879_1558814_2016-07-25_0e16')\n",
84+
"pathlib.Path(train_folder).mkdir(parents=True, exist_ok=True)\n",
85+
"\n",
86+
"train_scene = os.path.join(train_folder, '210879_1558814_2016-07-25_0e16_BGRN_Analytic.tif')"
87+
]
88+
},
89+
{
90+
"cell_type": "code",
91+
"execution_count": 22,
92+
"metadata": {},
93+
"outputs": [],
94+
"source": [
95+
"# First test if scene file exists, if not, use the Planet commandline tool to download the image, metadata, and udm.\n",
96+
"# This command assumes a bash shell, available in Unix-based operating systems.\n",
97+
"!test -f $train_scene || \\\n",
98+
" planet data download \\\n",
99+
" --item-type PSOrthoTile \\\n",
100+
" --dest $train_folder \\\n",
101+
" --asset-type analytic,analytic_xml,udm \\\n",
102+
" --string-in id 210879_1558814_2016-07-25_0e16"
103+
]
104+
},
105+
{
106+
"cell_type": "markdown",
107+
"metadata": {},
108+
"source": [
109+
"### Crop and Project CDL\n",
110+
"\n",
111+
"We project, resample, and crop the CDL to match the Orthotile and save in train dataset."
112+
]
113+
},
114+
{
115+
"cell_type": "code",
116+
"execution_count": 27,
117+
"metadata": {},
118+
"outputs": [],
119+
"source": [
120+
"# Utility functions: crop, resample, and project an image\n",
121+
"\n",
122+
"# These use gdalwarp. for a description of gdalwarp command line options, see:\n",
123+
"# http://www.gdal.org/gdalwarp.html\n",
124+
"\n",
125+
"def gdalwarp_project_options(src_crs, dst_crs):\n",
126+
" return ['-s_srs', src_crs.to_string(), '-t_srs', dst_crs.to_string()]\n",
127+
"\n",
128+
"def gdalwarp_crop_options(bounds, crs):\n",
129+
" xmin, ymin, xmax, ymax = [str(b) for b in bounds]\n",
130+
" # -te xmin ymin xmax ymax\n",
131+
" return ['-te', xmin, ymin, xmax, ymax]\n",
132+
"\n",
133+
"def gdalwarp_resample_options(width, height, technique='near'):\n",
134+
" # for technique options, see: http://www.gdal.org/gdalwarp.html\n",
135+
" return ['-ts', width, height, '-r', technique]\n",
136+
" \n",
137+
"def gdalwarp(input_filename, output_filename, options):\n",
138+
" commands = _gdalwarp_commands(input_filename, output_filename, options)\n",
139+
"\n",
140+
" # print error if one is encountered\n",
141+
" # https://stackoverflow.com/questions/29580663/save-error-message-of-subprocess-command\n",
142+
" try:\n",
143+
" output = check_output(commands, stderr=STDOUT)\n",
144+
" except CalledProcessError as exc:\n",
145+
" print(exc.output)\n",
146+
"\n",
147+
"def _gdalwarp_commands(input_filename, output_filename, options):\n",
148+
" commands = ['gdalwarp'] + options + \\\n",
149+
" ['-overwrite',\n",
150+
" input_filename,\n",
151+
" output_filename]\n",
152+
" print(' '.join(commands))\n",
153+
" return commands\n",
154+
"\n",
155+
"def _test():\n",
156+
" TEST_DST_SCENE = train_scene\n",
157+
" TEST_SRC_SCENE = cdl_full\n",
158+
"\n",
159+
" with rasterio.open(TEST_DST_SCENE, 'r') as dst:\n",
160+
" with rasterio.open(TEST_SRC_SCENE, 'r') as src:\n",
161+
" print(gdalwarp_project_options(src.crs, dst.crs))\n",
162+
" print(gdalwarp_crop_options(dst.bounds, dst.crs))\n",
163+
" print(gdalwarp_resample_options(dst.width, dst.height))\n",
164+
"# _test()"
165+
]
166+
},
167+
{
168+
"cell_type": "code",
169+
"execution_count": 54,
170+
"metadata": {},
171+
"outputs": [],
172+
"source": [
173+
"# lossless compression of an image\n",
174+
"def _compress(input_filename, output_filename):\n",
175+
" commands = ['gdal_translate',\n",
176+
" '-co', 'compress=LZW',\n",
177+
" '-co', 'predictor=2',\n",
178+
" input_filename,\n",
179+
" output_filename]\n",
180+
" print(' '.join(commands))\n",
181+
"# subprocess.check_call(commands)\n",
182+
"\n",
183+
" # print error if one is encountered\n",
184+
" # https://stackoverflow.com/questions/29580663/save-error-message-of-subprocess-command\n",
185+
" try:\n",
186+
" output = check_output(commands, stderr=STDOUT)\n",
187+
" except CalledProcessError as exc:\n",
188+
" print(exc.output)"
189+
]
190+
},
191+
{
192+
"cell_type": "code",
193+
"execution_count": 55,
194+
"metadata": {},
195+
"outputs": [],
196+
"source": [
197+
"def prepare_cdl_image(cdl_filename, dst_filename, out_filename, compress=False, overwrite=True):\n",
198+
" '''Project, crop, and resample cdl image to match dst_filename image.'''\n",
199+
" \n",
200+
" with rasterio.open(cdl_filename, 'r') as src:\n",
201+
" with rasterio.open(dst_filename, 'r') as dst:\n",
202+
" # project\n",
203+
" src_crs = _add_nad_datum(src.crs) # Manually add NAD83 datum\n",
204+
" proj_options = gdalwarp_project_options(src_crs, dst.crs)\n",
205+
"\n",
206+
" # crop\n",
207+
" crop_options = gdalwarp_crop_options(dst.bounds, dst.crs)\n",
208+
"\n",
209+
" # resample\n",
210+
" width, height = dst.shape\n",
211+
" resample_options = gdalwarp_resample_options(str(width), str(height), 'near')\n",
212+
"\n",
213+
" options = proj_options + crop_options + resample_options\n",
214+
" \n",
215+
" # check to see if output file exists, if it does, do not warp\n",
216+
" if os.path.isfile(dst_filename) and not overwrite:\n",
217+
" print('{} already exists. Aborting warp of {}'.format(dst_filename, cdl_filename))\n",
218+
" elif compress:\n",
219+
" with tempfile.NamedTemporaryFile(suffix='.vrt') as vrt_file:\n",
220+
" options += ['-of', 'vrt']\n",
221+
" gdalwarp(cdl_filename, vrt_file.name, options)\n",
222+
" _compress(vrt_file.name, out_filename)\n",
223+
" else:\n",
224+
" print(options)\n",
225+
" gdalwarp(cdl_filename, out_filename, options)\n",
226+
"\n",
227+
"def _add_nad_datum(crs):\n",
228+
" '''Rasterio is not reading the datum for the CDL image so add it manually'''\n",
229+
" crs.update({'datum': 'NAD83'})\n",
230+
" return crs\n",
231+
"\n",
232+
"def _test(delete=True):\n",
233+
" TEST_DST_SCENE = train_scene\n",
234+
" TEST_SRC_SCENE = cdl_full\n",
235+
" \n",
236+
" with tempfile.NamedTemporaryFile(suffix='.tif', delete=delete, dir='.') as out_file:\n",
237+
" # create output\n",
238+
" prepare_cdl_image(TEST_SRC_SCENE, TEST_DST_SCENE, out_file.name)\n",
239+
"\n",
240+
" # check output\n",
241+
" with rasterio.open(TEST_DST_SCENE, 'r') as dst:\n",
242+
" with rasterio.open(out_file.name, 'r') as src:\n",
243+
" assert dst.crs == src.crs, '{} != {}'.format(src.crs, dst.crs)\n",
244+
" assert dst.bounds == src.bounds\n",
245+
" assert dst.shape == src.shape\n",
246+
"# _test()"
247+
]
248+
},
249+
{
250+
"cell_type": "code",
251+
"execution_count": 56,
252+
"metadata": {},
253+
"outputs": [],
254+
"source": [
255+
"# define and, if necessary, create pre-data directory\n",
256+
"predata_folder = 'pre-data'\n",
257+
"pathlib.Path(predata_folder).mkdir(parents=True, exist_ok=True)"
258+
]
259+
},
260+
{
261+
"cell_type": "code",
262+
"execution_count": 57,
263+
"metadata": {},
264+
"outputs": [
265+
{
266+
"name": "stdout",
267+
"output_type": "stream",
268+
"text": [
269+
"data/cart/210879_1558814_2016-07-25_0e16/210879_1558814_2016-07-25_0e16_BGRN_Analytic.tif already exists. Aborting warp of data/CDL_2016_19.tif\n"
270+
]
271+
}
272+
],
273+
"source": [
274+
"# create train dataset gold image from CDL image\n",
275+
"train_CDL_filename = os.path.join(predata_folder, 'CDL_2016_19_train.tif')\n",
276+
"\n",
277+
"prepare_cdl_image(cdl_full, train_scene, train_CDL_filename, overwrite=False, compress=True)"
278+
]
279+
},
280+
{
281+
"cell_type": "markdown",
282+
"metadata": {},
283+
"source": [
284+
"## Prepare Train CDL Dataset\n",
285+
"\n",
286+
"Now we prepare the test CDL dataset.\n",
287+
"\n",
288+
"The test scene is [210863_1559015_2016-07-25_0e0f](https://api.planet.com/data/v1/item-types/PSOrthoTile/items/210863_1559015_2016-07-25_0e0f/thumb).\n",
289+
"\n",
290+
"### Download Scene and Supporting Files"
291+
]
292+
},
293+
{
294+
"cell_type": "code",
295+
"execution_count": 59,
296+
"metadata": {},
297+
"outputs": [],
298+
"source": [
299+
"# define and, if necessary, create data directory\n",
300+
"test_folder = os.path.join('data', 'cart', '210863_1559015_2016-07-25_0e0f')\n",
301+
"pathlib.Path(test_folder).mkdir(parents=True, exist_ok=True)\n",
302+
"\n",
303+
"test_scene = os.path.join(test_folder, '210863_1559015_2016-07-25_0e0f_BGRN_Analytic.tif')"
304+
]
305+
},
306+
{
307+
"cell_type": "code",
308+
"execution_count": 64,
309+
"metadata": {},
310+
"outputs": [],
311+
"source": [
312+
"# First test if scene file exists, if not, use the Planet commandline tool to download the image, metadata, and udm.\n",
313+
"# This command assumes a bash shell, available in Unix-based operating systems.\n",
314+
"!test -f $test_scene || \\\n",
315+
" planet data download \\\n",
316+
" --item-type PSOrthoTile \\\n",
317+
" --dest $test_folder \\\n",
318+
" --asset-type analytic,analytic_xml,udm \\\n",
319+
" --string-in id 210863_1559015_2016-07-25_0e0f"
320+
]
321+
},
322+
{
323+
"cell_type": "markdown",
324+
"metadata": {},
325+
"source": [
326+
"### Crop and Project CDL\n",
327+
"\n",
328+
"We project, resample, and crop the CDL to match the Orthotile and save in test dataset."
329+
]
330+
},
331+
{
332+
"cell_type": "code",
333+
"execution_count": 67,
334+
"metadata": {},
335+
"outputs": [
336+
{
337+
"name": "stdout",
338+
"output_type": "stream",
339+
"text": [
340+
"data/cart/210863_1559015_2016-07-25_0e0f/210863_1559015_2016-07-25_0e0f_BGRN_Analytic.tif already exists. Aborting warp of data/CDL_2016_19.tif\n"
341+
]
342+
}
343+
],
344+
"source": [
345+
"# create train dataset gold image from CDL image\n",
346+
"test_CDL_filename = os.path.join(predata_folder, 'CDL_2016_19_test.tif')\n",
347+
"\n",
348+
"prepare_cdl_image(cdl_full, test_scene, test_CDL_filename, overwrite=False, compress=True)"
349+
]
350+
},
351+
{
352+
"cell_type": "code",
353+
"execution_count": null,
354+
"metadata": {},
355+
"outputs": [],
356+
"source": []
357+
}
358+
],
359+
"metadata": {
360+
"kernelspec": {
361+
"display_name": "Python 3",
362+
"language": "python",
363+
"name": "python3"
364+
},
365+
"language_info": {
366+
"codemirror_mode": {
367+
"name": "ipython",
368+
"version": 3
369+
},
370+
"file_extension": ".py",
371+
"mimetype": "text/x-python",
372+
"name": "python",
373+
"nbconvert_exporter": "python",
374+
"pygments_lexer": "ipython3",
375+
"version": "3.6.6"
376+
}
377+
},
378+
"nbformat": 4,
379+
"nbformat_minor": 2
380+
}

0 commit comments

Comments
 (0)