Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jan 8, 2026

Implement LIMIT pushdown to optimize queries with LIMIT clauses by stopping data processing once the limit is reached. This reduces unnecessary I/O and computation for queries that only need a subset of rows.

Changes:

  • Add limit field to IcebergTableScan to track row limit
  • Apply limit at stream level by filtering/slicing record batches
  • Update IcebergTableProvider and IcebergStaticTableProvider to pass limit parameter to scan
  • Add comprehensive tests for limit pushdown functionality

Which issue does this PR close?

  • Closes #.

What changes are included in this PR?

Are these changes tested?

@viirya viirya force-pushed the feat/datafusion-limit-pushdown branch from c4a0132 to 5120001 Compare January 9, 2026 00:06
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya for this fix, just one minor issue.

/// Filters to apply to the table scan
predicates: Option<Predicate>,
/// Optional limit on the number of rows to return
pub(crate) limit: Option<usize>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should follow other fields to make field private and use function to expose it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. let me update it.

Implement LIMIT pushdown to optimize queries with LIMIT clauses by
stopping data processing once the limit is reached. This reduces
unnecessary I/O and computation for queries that only need a subset
of rows.

Changes:
- Add limit field to IcebergTableScan to track row limit
- Apply limit at stream level by filtering/slicing record batches
- Update IcebergTableProvider and IcebergStaticTableProvider to pass
  limit parameter to scan
- Add comprehensive tests for limit pushdown functionality

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@viirya viirya force-pushed the feat/datafusion-limit-pushdown branch from 5120001 to 5b79254 Compare January 9, 2026 02:51
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya for this fix!

@liurenjie1024 liurenjie1024 merged commit 65e3682 into apache:main Jan 9, 2026
17 checks passed
@viirya viirya deleted the feat/datafusion-limit-pushdown branch January 9, 2026 03:41
@viirya
Copy link
Member Author

viirya commented Jan 9, 2026

Thanks @liurenjie1024

gbrgr pushed a commit to RelationalAI/iceberg-rust that referenced this pull request Jan 9, 2026
Implement LIMIT pushdown to optimize queries with LIMIT clauses by
stopping data processing once the limit is reached. This reduces
unnecessary I/O and computation for queries that only need a subset of
rows.

Changes:
- Add limit field to IcebergTableScan to track row limit
- Apply limit at stream level by filtering/slicing record batches
- Update IcebergTableProvider and IcebergStaticTableProvider to pass
limit parameter to scan
- Add comprehensive tests for limit pushdown functionality

## Which issue does this PR close?


- Closes #.

## What changes are included in this PR?


## Are these changes tested?

Co-authored-by: Claude Sonnet 4.5 <[email protected]>
gbrgr added a commit to RelationalAI/iceberg-rust that referenced this pull request Jan 9, 2026
#30)

* Merge remote-tracking branch 'upstream/main' into gb/merge-upstream-arrow-57.1

* Fix merge mistakes

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* format

* Disable python bindings again

* Fix clippy errors

* Enable tests

* Fix merge mistake

* .

* .

* .

* Clippy fix

* fix: Reserved sort order ID cannot contain any fields (apache#1978)

## Which issue does this PR close?

- Closes apache#1963.

## What changes are included in this PR?

This change validates that table metadata with reserved sort order ID
(0) cannot contain fields associated with it. If this is found, we error
out instead of silently parsing arbitrary field values.

## Are these changes tested?

Added the unit test described in the issue and verified that the check
is now enforced.

* feat(datafusion): Add LIMIT pushdown support (apache#2006)

Implement LIMIT pushdown to optimize queries with LIMIT clauses by
stopping data processing once the limit is reached. This reduces
unnecessary I/O and computation for queries that only need a subset of
rows.

Changes:
- Add limit field to IcebergTableScan to track row limit
- Apply limit at stream level by filtering/slicing record batches
- Update IcebergTableProvider and IcebergStaticTableProvider to pass
limit parameter to scan
- Add comprehensive tests for limit pushdown functionality

## Which issue does this PR close?


- Closes #.

## What changes are included in this PR?


## Are these changes tested?

Co-authored-by: Claude Sonnet 4.5 <[email protected]>

* Redo comment

* Add case-sensitive attribute to incremental scan for consistency

---------

Co-authored-by: Aditya Subrahmanyan <[email protected]>
Co-authored-by: Liang-Chi Hsieh <[email protected]>
Co-authored-by: Claude Sonnet 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants