Skip to content

Conversation

@ndharasz
Copy link
Contributor

  • add new data module with balance_rank_transform and quantile_bin functions for building crypto data
  • add unit tests for new data functions
  • refactor existing functions into more appropriately named modules and avoid circular imports

…tions for building crypto data, add unit tests for new functions, refactor existing functions into more appropriately named modules and avoid circular imports
Copilot AI review requested due to automatic review settings September 30, 2025 23:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Refactors existing functions into more appropriately named modules to avoid circular imports and better organize the codebase, while adding new crypto-specific data transformation functions.

  • Moves mathematical functions from scoring.py to new math.py module
  • Creates new indexing.py module for index manipulation functions
  • Adds new data.py module with balanced_rank_transform and quantile_bin functions for crypto data processing

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_scoring.py Removes tests for moved functions while keeping scoring-specific tests
tests/test_math.py New test file for mathematical functions moved from scoring module
tests/test_indexing.py New test file for index manipulation functions
tests/test_data.py New test file for data transformation functions
pyproject.toml Updates version to 0.6.0.dev0
numerai_tools/typing.py New module defining type variables for DataFrame/Series unions
numerai_tools/signals.py Updates imports to use new module structure
numerai_tools/scoring.py Refactored to import from new modules and focus on scoring functions
numerai_tools/math.py New module containing mathematical transformation functions
numerai_tools/indexing.py New module for index filtering and sorting functions
numerai_tools/data.py New module with crypto data transformation functions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +32 to +33
# Ensure denominator is at least 1 to avoid division by zero
denom = max(int(s.count()), 1)
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The division by zero protection using max(int(s.count()), 1) may produce incorrect results when all values are NaN. Consider using s.count() directly and handling the edge case explicitly, as pandas ranking already handles empty series appropriately.

Suggested change
# Ensure denominator is at least 1 to avoid division by zero
denom = max(int(s.count()), 1)
denom = s.count()
if denom == 0:
# All values are NaN; return a series of NaNs with the same index
return pd.Series([np.nan] * len(s), index=s.index)

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings October 1, 2025 09:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +125 to +129
# assign bins according to quantiles
s.loc[s <= q[quantiles[0]]] = bins[0]
for i in range(1, len(bins) - 1):
s.loc[(s > q[quantiles[i - 1]]) & (s <= q[quantiles[i]])] = bins[i]
s.loc[s >= q[quantiles[-1]]] = bins[-1]
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binning logic has overlapping conditions that could cause values to be assigned to multiple bins. Values exactly equal to quantile thresholds will be overwritten by subsequent assignments. The conditions should be mutually exclusive.

Suggested change
# assign bins according to quantiles
s.loc[s <= q[quantiles[0]]] = bins[0]
for i in range(1, len(bins) - 1):
s.loc[(s > q[quantiles[i - 1]]) & (s <= q[quantiles[i]])] = bins[i]
s.loc[s >= q[quantiles[-1]]] = bins[-1]
# assign bins according to quantiles using pd.cut for mutually exclusive bins
bin_edges = [-np.inf] + [q[q_idx] for q_idx in quantiles] + [np.inf]
s = pd.cut(s, bins=bin_edges, labels=bins, include_lowest=True).astype(float)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants