-
Notifications
You must be signed in to change notification settings - Fork 392
feat(datafusion): Add LIMIT pushdown support #2006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(datafusion): Add LIMIT pushdown support #2006
Conversation
c4a0132 to
5120001
Compare
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya for this fix, just one minor issue.
| /// Filters to apply to the table scan | ||
| predicates: Option<Predicate>, | ||
| /// Optional limit on the number of rows to return | ||
| pub(crate) limit: Option<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should follow other fields to make field private and use function to expose it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. let me update it.
Implement LIMIT pushdown to optimize queries with LIMIT clauses by stopping data processing once the limit is reached. This reduces unnecessary I/O and computation for queries that only need a subset of rows. Changes: - Add limit field to IcebergTableScan to track row limit - Apply limit at stream level by filtering/slicing record batches - Update IcebergTableProvider and IcebergStaticTableProvider to pass limit parameter to scan - Add comprehensive tests for limit pushdown functionality Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
5120001 to
5b79254
Compare
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya for this fix!
|
Thanks @liurenjie1024 |
Implement LIMIT pushdown to optimize queries with LIMIT clauses by stopping data processing once the limit is reached. This reduces unnecessary I/O and computation for queries that only need a subset of rows. Changes: - Add limit field to IcebergTableScan to track row limit - Apply limit at stream level by filtering/slicing record batches - Update IcebergTableProvider and IcebergStaticTableProvider to pass limit parameter to scan - Add comprehensive tests for limit pushdown functionality ## Which issue does this PR close? - Closes #. ## What changes are included in this PR? ## Are these changes tested? Co-authored-by: Claude Sonnet 4.5 <[email protected]>
#30) * Merge remote-tracking branch 'upstream/main' into gb/merge-upstream-arrow-57.1 * Fix merge mistakes * . * . * . * . * . * . * . * . * . * . * format * Disable python bindings again * Fix clippy errors * Enable tests * Fix merge mistake * . * . * . * Clippy fix * fix: Reserved sort order ID cannot contain any fields (apache#1978) ## Which issue does this PR close? - Closes apache#1963. ## What changes are included in this PR? This change validates that table metadata with reserved sort order ID (0) cannot contain fields associated with it. If this is found, we error out instead of silently parsing arbitrary field values. ## Are these changes tested? Added the unit test described in the issue and verified that the check is now enforced. * feat(datafusion): Add LIMIT pushdown support (apache#2006) Implement LIMIT pushdown to optimize queries with LIMIT clauses by stopping data processing once the limit is reached. This reduces unnecessary I/O and computation for queries that only need a subset of rows. Changes: - Add limit field to IcebergTableScan to track row limit - Apply limit at stream level by filtering/slicing record batches - Update IcebergTableProvider and IcebergStaticTableProvider to pass limit parameter to scan - Add comprehensive tests for limit pushdown functionality ## Which issue does this PR close? - Closes #. ## What changes are included in this PR? ## Are these changes tested? Co-authored-by: Claude Sonnet 4.5 <[email protected]> * Redo comment * Add case-sensitive attribute to incremental scan for consistency --------- Co-authored-by: Aditya Subrahmanyan <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]> Co-authored-by: Claude Sonnet 4.5 <[email protected]>
Implement LIMIT pushdown to optimize queries with LIMIT clauses by stopping data processing once the limit is reached. This reduces unnecessary I/O and computation for queries that only need a subset of rows.
Changes:
Which issue does this PR close?
What changes are included in this PR?
Are these changes tested?