Supervised and unsupervised learning utilizing feature engineering and using an exhaustive search of ideal pre-processing (if any), algorithms, and hyper-parameters.
This program is currently tuned towards binary classification such as those seen in medicine (eg. disease or no disease).
Below are the steps to setup a new environment for running MILO-ML on a Debian based machine. The below steps are based on a fresh Debian 10.1 install. If you already have a working Python environment, you can skip this section.
# Install dependencies
apt install build-essential nodejs libpython3.7-dev python-virtualenv rabbitmq-server libomp-dev
# Setup virtual environment
virtualenv -p python3 milo-env
# Change to this environment
source milo-env/bin/activate
# Always source this environment (optional)
printf "\nsource /home/<username>/milo-env/bin/activate" >> .bashrc# Install brew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
# Install dependencies
brew install rabbitmq gcc freetype node
pip3 install virtualenv
# Setup virtual environment
virtualenv -p python3 milo-env
# Change to this environment
source milo-env/bin/activate
# Always source this environment (optional)
printf "\nsource /home/<username>/milo-env/bin/activate" >> .bash_profilePerform the following steps to install the application:
git clone [email protected]:v3/milo-ml/MILO-ML/AutoML
cd AutoML
npm install
# For future updates, simply run:
npm run updateCurrently data is expected to be in CSV format in two files:
train.csv: The complete training set which will be split into a train and test set.
This data set should be balanced 50/50. If the data set is not balanced a warning
will be shown (when run in via the CLI).
test.csv: A secondary data set which will be used independent of model generation
and will be used to determine the generalizability of the model. This data set
reflects the prevalence of disease being classified (eg. not balanced).
To run the program simply execute the following command:
# Replace TARGET with the name of the column you are targeting
python cli.py train.csv test.csv TARGET | tee report.txtThis will execute the program and send the output to both the terminal and
the file report.txt. It will also output report.csv which contains the summary
of all models generated.
If you do not pass a train and test spreadsheet, sample data contained within
sample-data will be used.
Running the application as a service with an HTTP API and Angular SPA front end can be done by using the following command:
npm run serve# Run all unit tests
npm testVuePress is used to generate the documentation which can be run as follows:
cd docs
npm run devThis repository uses a git submodule.
If you cloned the repository without using --recursive, then you can initialize and clone the submodule with the following steps.
-
Init the submodule
git submodule init
-
Update the submodule
git submodule update --remote
For more advanced usage, please refer to the git documentation: https://git-scm.com/book/en/v2/Git-Tools-Submodules