Skip to content

Conversation

@weixuanfu
Copy link

What does this PR do?

  1. Add subsample option and it is subsample ratio of the training instance. For example, setting it to 0.5 means that TPOT randomly collects half of training samples for pipeline optimization process.

  2. This function are designed for very large dataset. So a warning message is raised once the subsample size is too small. Too small subsample size cause unpredictable outcomes as mentioned in issue Add subsample option for speeding up TPOT at large dataset #388

  3. Add unit tests and docs for this function

Where should the reviewer start?

base.py

How should this PR be tested?

Any TPOT example with adding subsample parameter.

Any background context you want to provide?

#388

What are the relevant issues?

#388

Questions:

  • Do the docs need to be updated? Yes, updated in the PR
  • Does this PR add new (Python) dependencies? No

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 77.406% when pulling c667c2a on weixuanfu2016:subsample_param into a48add9 on rhiever:development.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 77.406% when pulling a6d25a7 on weixuanfu2016:subsample_param into a48add9 on rhiever:development.

@rhiever
Copy link

rhiever commented May 12, 2017

Will merge this PR once the merge conflicts in tests.py are resolved.

@weixuanfu
Copy link
Author

OK, I will resolve these conflicts in tests.py

@rhiever rhiever merged commit 54b3df6 into EpistasisLab:development May 12, 2017
@rhiever
Copy link

rhiever commented May 12, 2017

Merged.

@weixuanfu
Copy link
Author

weixuanfu commented May 12, 2017

...Miss a commit, Merge too quickly

@rhiever
Copy link

rhiever commented May 12, 2017

Merged that one too.

@weixuanfu
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants