Create readme.md

tirthajyoti · web-flow · commit 675c3bd5df5b · 2021-03-21T02:33:50.000-07:00
diff --git a/Pytest/readme.md b/Pytest/readme.md
@@ -0,0 +1,50 @@
+## A sample Pytest module for a Scikit-learn model training function
+
+### How to run Pytest
+
+- Install pytest `pip install pytest`
+
+- Copy/clone the two Python scripts from this directory
+- The `linear_model.py` has a single function that trains a simple linear regression model using scikit-learn. Note that it has basic assertion tests and `try-except` construct to handle potential input errors.
+- The `test_linear_model.py` file is the test module which acts as the input to the Pytest program.
+- Run `pytest test_linear_model.py -v` on your terminal to run the tests. You should see something like following,
+
+```
+======================================================================================================= test session starts ======================================================================================================== 
+platform win32 -- Python 3.9.1, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- c:\program files\python39\python.exe
+cachedir: .pytest_cache
+rootdir: C:\Users\TirthajyotiSarkar\Documents\Python Notebooks\Pytest
+plugins: anyio-2.0.2
+collected 7 items                                                                                                                                                                                                                    
+
+test_linear_model.py::test_model_return_object PASSED                                                                                                                                                                         [ 14%] 
+test_linear_model.py::test_model_return_vals PASSED                                                                                                                                                                           [ 28%] 
+test_linear_model.py::test_model_save_load PASSED                                                                                                                                                                             [ 42%] 
+test_linear_model.py::test_loaded_model_works PASSED                                                                                                                                                                          [ 57%] 
+test_linear_model.py::test_model_works_data_range PASSED                                                                                                                                                                      [ 71%] 
+test_linear_model.py::test_noise_impact PASSED                                                                                                                                                                                [ 85%] 
+test_linear_model.py::test_wrong_input_raises_assertion PASSED                                                                                                                                                                [100%] 
+
+========================================================================================================= warnings summary ========================================================================================================= 
+..\..\..\..\..\..\program files\python39\lib\site-packages\win32\lib\pywintypes.py:2
+  c:\program files\python39\lib\site-packages\win32\lib\pywintypes.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
+    import imp, sys, os
+
+-- Docs: https://docs.pytest.org/en/stable/warnings.html
+=================================================================================================== 7 passed, 1 warning in 1.03s ===================================================================================================
+```
+
+### What does it mean?
+
+- The terminal message (above) indicates that 7 tests were run (corresponding to the 7 functions in the `test_linear_model.py` module) and all of them passed.
+
+- It also shows the order of the tests run (this is because you included the `- v` argument on the command line while running `pytest` command). Pytest allows you to randomize the testing sequence but that discussion is for another day.
+
+### Notes on the test module
+
+- Note, how the `test_linear_model.py` contains 7 functions with names starting with `test...`. Those contain the actual test code. It also has a couple of data constructor functions whose names do not start with `test...` and they are ignored by Pytest.
+
+- Note that we need to import a bunch of libraries to test all kind of things e.g. we imported libraries like `joblib`, `os`, `sklearn`, `numpy`, and of course, the `train_linear_model` function from the `linear_model` module.
+- Note the clear and distinctive names for the testing functions e.g. `test_model_return_object()` which only checks the returned object from the `train_linear_model` function, or the `test_model_save_load()` which checks whether the saved model can be loaded properly (but does not try to make predictions or anything). 
+- For checking the predictions i.e. whether the trained model really works or not, we have the `test_loaded_model_works()` function which uses a fixed data generator with no noise (as compared to other cases, where we can use a random data generator with random noise). It passes on the fixed `X` and `y` data, loads the trained model, checks if the $R^2$ scores are perfectly equal to 1.0 (true for a fixed dataset with no noise) and then compare the model predictions with the original ground truth `y` vector. Note, how it uses a special Numpy testing function `np.testing.assert_allclose` instead of the regular `assert` statement. This is to avoid any potential numerical precision issues associated with the model data i.e. Numpy arrays and the prediction algorithm involving linear algebra operations.
+- Take a look at the `random_data_constructor` and `fixed_data_constructor` functions too to see how they are designed and used in the test code. The `random_data_constructor` even takes a `noise_mag` argument which is used to control the magnitude of noise to test the expected behavior of a linear regression algorithm. Refer to the `test_noise_impact` function for this.