patrickcgray
diff --git a/‎jupyter-notebooks/temporal-crop-classification/README.txt‎
Lines changed: 22 additions & 0 deletions b/‎jupyter-notebooks/temporal-crop-classification/README.txt‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎jupyter-notebooks/temporal-crop-classification/RE_DATA_USED.txt‎
Lines changed: 50 additions & 0 deletions b/‎jupyter-notebooks/temporal-crop-classification/RE_DATA_USED.txt‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎jupyter-notebooks/temporal-crop-classification/Rustowicz_final_report.pdf‎
-1.68 MB b/‎jupyter-notebooks/temporal-crop-classification/Rustowicz_final_report.pdf‎
-1.68 MB
diff --git a/‎jupyter-notebooks/temporal-crop-classification/Rustowicz_poster.pdf‎
868 KB b/‎jupyter-notebooks/temporal-crop-classification/Rustowicz_poster.pdf‎
868 KB
diff --git a/‎jupyter-notebooks/temporal-crop-classification/Rustowicz_report.pdf‎
1.82 MB b/‎jupyter-notebooks/temporal-crop-classification/Rustowicz_report.pdf‎
1.82 MB
diff --git a/‎jupyter-notebooks/temporal-crop-classification/classifiers/CNN_keras.py‎
Lines changed: 20 additions & 19 deletions b/‎jupyter-notebooks/temporal-crop-classification/classifiers/CNN_keras.py‎
Lines changed: 20 additions & 19 deletions
diff --git a/‎jupyter-notebooks/temporal-crop-classification/classifiers/NN_keras.py‎
Lines changed: 6 additions & 18 deletions b/‎jupyter-notebooks/temporal-crop-classification/classifiers/NN_keras.py‎
Lines changed: 6 additions & 18 deletions
diff --git a/‎jupyter-notebooks/temporal-crop-classification/classifiers/softmax_sklearn.py‎
Lines changed: 24 additions & 12 deletions b/‎jupyter-notebooks/temporal-crop-classification/classifiers/softmax_sklearn.py‎
Lines changed: 24 additions & 12 deletions
diff --git a/‎jupyter-notebooks/temporal-crop-classification/classifiers/svms_sklearn_prints.py‎ renamed to ‎jupyter-notebooks/temporal-crop-classification/classifiers/svms_sklearn.py‎
Lines changed: 19 additions & 20 deletions b/‎jupyter-notebooks/temporal-crop-classification/classifiers/svms_sklearn_prints.py‎ renamed to ‎jupyter-notebooks/temporal-crop-classification/classifiers/svms_sklearn.py‎
Lines changed: 19 additions & 20 deletions
@@ -0,0 +1,22 @@
+Documentation for Crop Classification with Multi-Temporal Satellite Imagery
+Author: Rose Rustowicz
+21 December 2017
+
+Directories:
+classifiers -- contains code for classifiers used (softmax regression, SVMs, Neural Network, Convolutional Neural Network)
+dataset_construction -- used to download and construct dataset 
+evaluation -- used to evaluate further results (most evaluation is done within classifiers)
+
+1.) To get the data, use 'dataset_construction/get_crop_data.ipynb'. Given an area of interest (AOI) (specified in a .geojson file), this notebook will take you through querying the Planet API, activating and downloading scenes, and clipping downloaded scenes to an AOI.
+
+2.) To download the Crop Data Layer labels, go to 'https://nassgeodata.gmu.edu/CropScape/'. In the top bar, select the US map icon (filled with the US flag), or the icon to the right of that one, which you can use to manually select an area of interest. Select a region on tha map, making sure that your AOI is within the specified region. Click on the right-most icon (the folder with a green arrow) on the top bar. Download the selected AOI. 
+
+3.) Now that you have downloaded data from CropScape, you need to clip it to the same AOI as in the imagery (from step 1). Open either 'dataset_construction/Crop_CDL_AOIs.ipynb' or 'dataset_construction/clip_CDL.py'. Specify the path to the downloaded CDL labels (from step 2) as 'CDL_fname' in the first line of the notebook. Specify the AOI filename in the second line of the notebook. This should be the same AOI used in step 1. This clipped CDL labels should be saved to the current directory.
+
+4.) You will not create data from the clipped imagery. Open 'dataset_construction/clips_to_datacube.py'. Make sure that all of the clipped imagery from step 1 is saved to a folder, and input that folder as 'imgs_dir' under the first line of the 'main()' function. Specify the filename of the clipped data labels from step 3 as 'labels_fname'. The output will be five files, one datacube for each spectral band: 'b_time.npy', 'g_time.npy', 'r_time.npy', 're_time.npy', and 'nir_time.npy'.
+
+5.) From the large selected scene, we will select the dataset. This step requires some small investigation in order to select the classes for your classification problem. You will be using 'dataset_construction/mk_dataset.py'. Open up this file. Starting in the first line of the 'main' function, specify the filenames (and necessary relative paths) of the files created in step 4. The 'sort_crops' function will print out the top 20 crops in the dataset. It is your decision on which crops to keep. The crop types corresponding to the 'sorted_crops' values can be found in the CDL database, for example here: https://www.nass.usda.gov/Research_and_Science/Cropland/metadata/metadata_ca16.htm, where the values correspond to the 'Attribute Code' column. Select which indices of the top 20 crops you want to keep, and specify those indices within the 'get_masks' function when defining the 'final_mask_xl' variable. Within the 'concat_features' function, you will also need to change the indices of the masks for each crop type. And within 'get_labels', you will need to change the label numbers from Attribute Codes into integer values starting from 0. 'crop_dataset' takes 100000 examples from each class and adds them to the dataset. Feel free to change this number. You now have data and labels for training, validation, and testing.
+
+6.) Now that your data is ready, you can run the classifiers! Go to the 'classifiers/' directory. You will find code for multiclass logistic regression ('softmax_sklearn.py'), support vector machines ('svms_sklearn.py'), a simple neural network ('NN_keras.py'), and a simple Convolutional neural network ('CNN_keras.py'). Make sure that you have the ability to use both scikit learn and keras on the computer you are using. You may need to tune the parameters in each of the classification methods.
+
+
@@ -0,0 +1,50 @@
+For Scene #1, 15 timestamps were used from Rapid Eye. 
+
+The defined polygon is ... 
+POLYGON((-119.8103+35.9859,-119.6456+35.9859,-119.6456+36.1363,-119.8103+36.1363,-119.8103+35.9859))
+
+Dates used for Scene #1: 
+1155704_2016-02-07_RE4_3A
+1155704_2016-02-24_RE1_3A
+1155704_2016-03-01_RE3_3A
+1155704_2016-04-04_RE4_3A
+1155704_2016-05-23_RE5_3A
+1155704_2016-06-17_RE1_3A
+1155704_2016-07-18_RE4_3A   #used for mono-temporal case
+1155704_2016-08-02_RE5_3A
+1155704_2016-08-05_RE3_3A
+1155704_2016-08-23_RE2_3A
+1155704_2016-09-06_RE1_3A
+1155704_2016-09-25_RE1_3A
+1155704_2016-10-25_RE3_3A
+1155704_2016-11-17_RE2_3A
+1155704_2016-12-27_RE4_3A
+
+
+For Scene #2, 21 timestamps were used from Rapid Eye. 
+
+The defined polygon is ... 
+POLYGON((-119.6147+36.2116,-119.5298+36.2116,-119.5298+36.2981,-119.6147+36.2981,-119.6147+36.2116))
+
+Dates used for Scene #2: 
+1155805_2016-02-07_RE4_3A
+1155805_2016-02-24_RE1_3A
+1155805_2016-03-01_RE3_3A
+1155805_2016-04-04_RE4_3A
+1155805_2016-04-27_RE3_3A
+1155805_2016-05-15_RE2_3A
+1155805_2016-05-23_RE5_3A
+1155805_2016-06-17_RE1_3A
+1155805_2016-07-18_RE4_3A   #used for mono-temporal case
+1155805_2016-08-23_RE2_3A
+1155805_2016-08-24_RE3_3A
+1155805_2016-09-06_RE1_3A
+1155805_2016-09-08_RE4_3A
+1155805_2016-09-09_RE5_3A
+1155805_2016-09-25_RE1_3A
+1155805_2016-10-01_RE3_3A
+1155805_2016-10-06_RE3_3A
+1155805_2016-10-25_RE3_3A
+1155805_2016-11-13_RE3_3A
+1155805_2016-11-17_RE2_3A
+1155805_2016-12-27_RE4_3A
@@ -1,4 +1,5 @@
-# Implementation of a CNN using Keras
+# Implementation of a CNN using Keras. Tuning of hyperparameters may be needed for 
+# best performance
 
 import numpy as np
 import pdb
@@ -57,19 +58,19 @@ def evaluate_loaded_model(loaded_model, X_test, y_test):
     print('%s: %.2f%%' % (loaded_model.metrics_names[1], score[1]*100))
 
 def create_model(num_classes):
-    # Define the model with the Keras Sequential modeling framework
+    # Define the model with the Keras Sequential modeling framework. Number of input features and output classes
+    # must be changed depending on what dataset you are working with
     model = Sequential()
-   
-    #model.add(Conv1D(32, kernel_size=5, strides=1, activation='relu', input_shape=(105,1)))
-    model.add(Conv1D(8, kernel_size=3, strides=1, activation='relu', input_shape=(5,1)))
-    #model.add(MaxPooling1D(pool_size=2, strides=2))
-    #model.add(Conv1D(64, 5, activation='relu'))
-    model.add(Conv1D(16, 3, activation='relu'))
-    #model.add(MaxPooling1D(pool_size=2))
-    model.add(Flatten())
-    #model.add(Dense(1000, activation='relu'))
-    model.add(Dense(200, activation='relu'))
-    model.add(Dense(num_classes, activation='softmax'))
+    #model.add(Conv1D(32, kernel_size=5, strides=1, activation='relu', input_shape=(105,1)))  # multi-temporal
+    model.add(Conv1D(8, kernel_size=3, strides=1, activation='relu', input_shape=(5,1))) # mono-temporal
+    #model.add(MaxPooling1D(pool_size=2, strides=2)) # multi-temporal
+    #model.add(Conv1D(64, 5, activation='relu')) # multi-temporal
+    model.add(Conv1D(16, 3, activation='relu')) # mono-temporal
+    #model.add(MaxPooling1D(pool_size=2)) # multi-temporal
+    model.add(Flatten()) # both
+    #model.add(Dense(1000, activation='relu')) # multi-temporal
+    model.add(Dense(200, activation='relu')) # mono-temporal
+    model.add(Dense(num_classes, activation='softmax')) # both
 
     model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
     return model
@@ -130,6 +131,7 @@ def main():
     print("Saved model to disk")
 
 def main_eval():
+    # Used to evaluate the CNN if the model is already saved (only inference, no training)
     json_fname = 'model_10epoch_kings04.json'
     h5_fname = 'model_10epoch_kings04.h5'
 
@@ -162,13 +164,12 @@ def main_eval():
 
     class_names = ['cotton', 'safflower', 'tomatoes', 'wintwheat', 'durwheat', 'idle']
     #class_names = ['wintwht/corn', 'alfalfa', 'almonds', 'pistachios', 'idle', 'corn', 'walnuts', 'cotton', 'wintwheat']
-    plt.figure() #figsize=(8.5,7))
-    #plt.figure()
+    plt.figure()
     plot_confusion_matrix(cnf_matrix, classes=class_names, normalize=True, title='Scene #1 Multi-Temporal Confusion Matrix')
-    #plt.show()
+    plt.show()
     matplotlib.rcParams.update({'font.size': 40})
-    plt.savefig('Scene1_confmat_TESTBLOCK.png')
+    #plt.savefig('Scene1_confmat_TESTBLOCK.png')
 
 if __name__ == '__main__':
-    #main()
-    main_eval()
+    main()
+    #main_eval()
@@ -1,5 +1,5 @@
 # This script uses the Sequential model within Keras to implement 
-# a simple one-hidden-layeer neural network 
+# a simple one-hidden-layer neural network. Tune parameters for best performance. 
 
 from keras.models import Sequential
 from keras.layers import Dense
@@ -14,11 +14,6 @@
 
 np.random.seed(10) # for reproducability
 
-def one_hot(y, num_classes):
-    onehot = np.zeros((y.shape[0], num_classes))
-    onehot[np.arange(y.shape[0]), y] = 1
-    return onehot
-
 def load_data():
     # Load train, validation, and test data
     X = np.load('../datasets/kings_05_rapideye/train_data_single.npy')
@@ -40,11 +35,16 @@ def load_data():
 def create_model(units, activation, loss, optimizer, metrics, reg, dropout_rate, weight_constraint):
     # Define model as a sequence of layers; Dense models are fully-connected models
     model = Sequential()
+    # Mono-temporal has 5 features (1 timestamp x 5 spectral bands)
     model.add(Dense(units=units, input_dim=5, activation=activation, kernel_regularizer=regularizers.l2(reg), kernel_constraint=maxnorm(weight_constraint)))
+    # Scene 1 has 75 features (15 timestamps x 5 spectral bands)
     #model.add(Dense(units=units, input_dim=75, activation=activation, kernel_regularizer=regularizers.l2(reg), kernel_constraint=maxnorm(weight_constraint)))
+    # Scene 2 has 105 features (21 timestamps x 5 spectral bands)
     #model.add(Dense(units=units, input_dim=105, activation=activation, kernel_regularizer=regularizers.l2(reg), kernel_constraint=maxnorm(weight_constraint)))
     model.add(Dropout(dropout_rate))
+    # Scene 1 has 6 output classes
     #model.add(Dense(6, activation='softmax'))
+    # Scene 2 has 9 output classes
     model.add(Dense(9, activation='softmax'))
     model.compile(optimizer, loss, metrics)
     return model
@@ -68,8 +68,6 @@ def predict(model, X, y, X_dev, y_dev, X_test, y_test):
     pred_y = model.predict(X)
     pred_y_dev = model.predict(X_dev)
     pred_y_test = model.predict(X_test)
-    
-    #print(predictions)
     return pred_y, pred_y_dev, pred_y_test
 
 def main():
@@ -85,16 +83,6 @@ def main():
                 print('epoch: %s' % (epoch))
                 model.fit(X, y, batch, epoch)
                 scores, val_scores, test_score = evaluate_model(model, X, y, X_dev, y_dev, X_test, y_test)
-    
-    #pred_y, pred_y_dev, pred_y_test = predict(model, X, y, X_dev, y_dev, X_test, y_test)
-
-    # Summarize results
-    #print('Best: %f using %s' % (grid_result.best_score_, grid_result.best_params_))
-    #means = grid_result.cv_results_['mean_test_score']
-    #stds = grid_result.cv_results_['std_test_score']
-    #params = grid_result.cv_results_['params']
-    #for mean, stdev, param in zip(means, stds, params):
-    #    print('%f (%f) with %r' % (mean, stdev, param))
 
 if __name__ == '__main__':
     main()
@@ -6,21 +6,33 @@
 from sklearn.metrics import classification_report
 import pdb
 
-X = np.load('../../datasets/kings_04_rapideye/train_data.npy')
-X = X*1.0/np.max(X)
-y = np.load('../../datasets/kings_04_rapideye/train_lbl_sc.npy')
+X_tr = np.load('../../datasets/kings_04_rapideye/train_data.npy')
+X_tr = X_tr*1.0/np.max(X_tr)
+y_tr = np.load('../../datasets/kings_04_rapideye/train_lbl_sc.npy')
 
-all_X = np.load('../../datasets/kings_04_rapideye/all_data.npy')
-all_X = all_X*1.0/np.max(all_X)
-all_y = np.load('../../datasets/kings_04_rapideye/all_lbls.npy')
+X_val = np.load('../../datasets/kings_04_rapideye/val_data.npy')
+X_val = X_val*1.0/np.max(X_val)
+y_val = np.load('../../datasets/kings_04_rapideye/val_lbl_sc.npy')
 
+X_test = np.load('../../datasets/kings_04_rapideye/test_data.npy')
+X_test = X_test*1.0/np.max(X_test)
+y_test = np.load('../../datasets/kings_04_rapideye/test_lbl_sc.npy')
+
+# Define the model, you may need to explore many for the best results.
 logreg = linear_model.LogisticRegression(penalty='l2', C=10, solver='saga', max_iter=10000,  multi_class='multinomial')
-lr_fit = logreg.fit(X, y)
-print(logreg.score(X, y))
+lr_fit = logreg.fit(X_tr, y_tr)
+print(logreg.score(X_tr, y_tr))
 
-predictions = logreg.predict(all_X)
-print(logreg.score(all_X, all_y))
-print(classification_report(all_y, predictions))
+predictions = logreg.predict(X_val)
+print(logreg.score(X_val, y_val))
+print(classification_report(y_val, predictions))
 
-np.save('logreg_kings04_predictions.npy', predictions)
+#np.save('logreg_kings04_predictions.npy', predictions)
 
+# Once the model is tuned for best performance, you can also evaluate the test set. 
+# If the model performs well on training, but poorly on validation, you have overfit to the training model. 
+# If the model performs well on training and validation, but poorly on test, you have overfit to the training and validation model.
+# If the model performs similarly on training / val / test but all are poor, your model is not complex enough.
+predictions_test = logreg.predict(X_test)
+print(logreg.score(X_test, y_test))
+print(classification_report(y_test, predictions_test))
@@ -1,4 +1,6 @@
-# Implementation of SVMs with sklearn
+# Implementation of SVMs with sklearn. You may need to tweak parameters
+# to achieve successful classification results. Depending on the size of 
+# the training set, the SVM code may take a long time to run.
 
 import numpy as np
 import pickle
@@ -8,31 +10,25 @@
 from sklearn.metrics import classification_report
 import matplotlib.pyplot as plt
 
+# Load training
 X_tr = np.load('../datasets/kings_04_rapideye/train_data.npy')
-
 X_tr = X_tr*1.0/np.max(X_tr)
 y_tr = np.load('../datasets/kings_04_rapideye/train_lbl_sc.npy')
 X = X_tr[0:60000,:]
 y = y_tr[0:60000]
-print(X.shape)
-print(y)
 
+# Load validation
 X_val = np.load('../datasets/kings_04_rapideye/val_data.npy')
 X_val = X_val*1.0/np.max(X_val)
 y_val = np.load('../datasets/kings_04_rapideye/val_lbl_sc.npy')
-print(X_val.shape)
-print(y_val)
-print(y_val.shape)
 
+# Load test
 X_test = np.load('../datasets/kings_04_rapideye/test_data.npy')
 X_test = X_test*1.0/np.max(X_test)
 y_test = np.load('../datasets/kings_04_rapideye/test_lbl_sc.npy')
-print(X_test.shape)
-print(y_test)
-print(y_test.shape)
 
-for C in [1000000]: #[100000, 1000000]:
-    for gamma in [0.1]: #[0.001, 0.01]:
+for C in [1, 10, 100, 1000, 10000, 100000, 1000000]:i #1000000
+    for gamma in [0.001, 0.01, 0.1]: #0.1
         print('---------')
         print('C: , gamma: ')
         print(C, gamma)
@@ -46,14 +42,17 @@
         print('Acc:')
         print(accuracy_score(y, y_pred))
         y_val_pred = rbf_svc.predict(X_val)
-        print('Result on validation set')
+        print('Result on validation set') 
         print(classification_report(y_val, y_val_pred))
         print('Acc:')
         print(accuracy_score(y_val, y_val_pred))
-        y_test_pred = rbf_svc.predict(X_test)
-        print('Result on test set')
-        print(classification_report(y_test, y_test_pred))
-        print('Acc:')
-        print(accuracy_score(y_test, y_test_pred))
-        fname = 'svm_kings04_temporal.sav'
-        pickle.dump(rbf_svc, open(fname, 'wb'))
+        # The model should be chosen based on the performance of the validation set. Do not view test results until the model is tweaked for best performance.
+        #y_test_pred = rbf_svc.predict(X_test)
+        #print('Result on test set')
+        #print(classification_report(y_test, y_test_pred))
+        #print('Acc:')
+        #print(accuracy_score(y_test, y_test_pred))
+        
+        # Save the model
+        #fname = 'svm_kings04_temporal.sav'
+        #pickle.dump(rbf_svc, open(fname, 'wb'))