-
Notifications
You must be signed in to change notification settings - Fork 13
v0.6.0 - better modules #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ndharasz
commented
Sep 30, 2025
- add new data module with balance_rank_transform and quantile_bin functions for building crypto data
- add unit tests for new data functions
- refactor existing functions into more appropriately named modules and avoid circular imports
…tions for building crypto data, add unit tests for new functions, refactor existing functions into more appropriately named modules and avoid circular imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Refactors existing functions into more appropriately named modules to avoid circular imports and better organize the codebase, while adding new crypto-specific data transformation functions.
- Moves mathematical functions from
scoring.pyto newmath.pymodule - Creates new
indexing.pymodule for index manipulation functions - Adds new
data.pymodule withbalanced_rank_transformandquantile_binfunctions for crypto data processing
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_scoring.py | Removes tests for moved functions while keeping scoring-specific tests |
| tests/test_math.py | New test file for mathematical functions moved from scoring module |
| tests/test_indexing.py | New test file for index manipulation functions |
| tests/test_data.py | New test file for data transformation functions |
| pyproject.toml | Updates version to 0.6.0.dev0 |
| numerai_tools/typing.py | New module defining type variables for DataFrame/Series unions |
| numerai_tools/signals.py | Updates imports to use new module structure |
| numerai_tools/scoring.py | Refactored to import from new modules and focus on scoring functions |
| numerai_tools/math.py | New module containing mathematical transformation functions |
| numerai_tools/indexing.py | New module for index filtering and sorting functions |
| numerai_tools/data.py | New module with crypto data transformation functions |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| # Ensure denominator is at least 1 to avoid division by zero | ||
| denom = max(int(s.count()), 1) |
Copilot
AI
Sep 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The division by zero protection using max(int(s.count()), 1) may produce incorrect results when all values are NaN. Consider using s.count() directly and handling the edge case explicitly, as pandas ranking already handles empty series appropriately.
| # Ensure denominator is at least 1 to avoid division by zero | |
| denom = max(int(s.count()), 1) | |
| denom = s.count() | |
| if denom == 0: | |
| # All values are NaN; return a series of NaNs with the same index | |
| return pd.Series([np.nan] * len(s), index=s.index) |
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| # assign bins according to quantiles | ||
| s.loc[s <= q[quantiles[0]]] = bins[0] | ||
| for i in range(1, len(bins) - 1): | ||
| s.loc[(s > q[quantiles[i - 1]]) & (s <= q[quantiles[i]])] = bins[i] | ||
| s.loc[s >= q[quantiles[-1]]] = bins[-1] |
Copilot
AI
Oct 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binning logic has overlapping conditions that could cause values to be assigned to multiple bins. Values exactly equal to quantile thresholds will be overwritten by subsequent assignments. The conditions should be mutually exclusive.
| # assign bins according to quantiles | |
| s.loc[s <= q[quantiles[0]]] = bins[0] | |
| for i in range(1, len(bins) - 1): | |
| s.loc[(s > q[quantiles[i - 1]]) & (s <= q[quantiles[i]])] = bins[i] | |
| s.loc[s >= q[quantiles[-1]]] = bins[-1] | |
| # assign bins according to quantiles using pd.cut for mutually exclusive bins | |
| bin_edges = [-np.inf] + [q[q_idx] for q_idx in quantiles] + [np.inf] | |
| s = pd.cut(s, bins=bin_edges, labels=bins, include_lowest=True).astype(float) |