Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
0d5866f
DF 45 blog post
Omega359 Feb 20, 2025
7417e4c
Update content/blog/2025-02-20-datafusion-45.0.0.md
Omega359 Feb 21, 2025
3799609
Update content/blog/2025-02-20-datafusion-45.0.0.md
Omega359 Feb 21, 2025
88d7b6b
Set author to PMC.
Omega359 Feb 22, 2025
5a13332
Set author to PMC, incorporated feedback.
Omega359 Feb 22, 2025
b8de014
Update content/blog/2025-02-20-datafusion-45.0.0.md
Omega359 Feb 22, 2025
8a46200
expanded GSOC as it may not be obvious what it is and linked it up.
Omega359 Feb 22, 2025
5460e1a
Grammar fix.
Omega359 Feb 22, 2025
a2e3503
Typo fix
Omega359 Feb 22, 2025
e8e6734
Typo fix
Omega359 Feb 22, 2025
7bb8713
Adding spark functions to looking ahead section
Omega359 Feb 22, 2025
b330523
minor change
Omega359 Feb 22, 2025
3b9b11d
Fixed Jonah Gao's handle.
Omega359 Feb 24, 2025
29af566
Update content/blog/2025-02-20-datafusion-45.0.0.md
alamb Feb 25, 2025
d49a65c
WIP for DF 49 blog post.
Omega359 Jul 1, 2025
0167406
WIP for DF 49 blog post.
Omega359 Jul 1, 2025
6102ade
Update topK dynamic filtering perf section, cleanup the upgrade and c…
Omega359 Jul 1, 2025
cbbad27
Merge remote-tracking branch 'upstream/main' into origin_main
Omega359 Jul 6, 2025
3ece66e
DF 47.0.0 blog post
Omega359 Jul 6, 2025
ef46a35
Remove incomplete and accidentally added DF 49 blog post
Omega359 Jul 6, 2025
9e0f4e1
Fix header.
Omega359 Jul 6, 2025
4f049f4
Grammar fix
Omega359 Jul 6, 2025
9780da0
Minor formatting
Omega359 Jul 6, 2025
cdf50f8
Adding disabling of re-validation of spill files to performance impro…
Omega359 Jul 6, 2025
1478e3d
Merge remote-tracking branch 'apache/main' into Omega359/main
alamb Jul 9, 2025
3859c80
Formatting and wordsmithing
alamb Jul 9, 2025
a149ac8
tweaks
alamb Jul 9, 2025
d433e7d
Update content/blog/2025-07-10-datafusion-47.0.0.md
alamb Jul 9, 2025
af1c645
Fixed link.
Omega359 Jul 9, 2025
88ad7c6
Add datafusion-tracing crate mention and logo, make text more concrete
alamb Jul 11, 2025
5f17467
Claude edits
alamb Jul 11, 2025
44aa48f
Update publishing date
alamb Jul 11, 2025
23a1512
Merge branch 'apache:main' into main
Omega359 Jul 12, 2025
7df9b65
Merge branch 'refs/heads/main' into origin_main
Omega359 Jul 18, 2025
3364f88
Skeleton of DF 49 blog post
Omega359 Jul 18, 2025
bfbcd1a
Fix frontmatter, remove breaking changes section, add link to new blog
alamb Jul 18, 2025
5ef1876
Write up dynamic filtering
alamb Jul 19, 2025
55d0e24
Add performance chart
alamb Jul 19, 2025
0165501
Reorder sections, add new diagram
alamb Jul 21, 2025
b3e72e5
Add note on async udfs
alamb Jul 21, 2025
4d08441
update async section
alamb Jul 21, 2025
b0bb055
Add note abotu WITHINK GROUP
alamb Jul 21, 2025
c6f7642
note about parquet encryption
alamb Jul 21, 2025
44e6e3b
Add spill to disk and regex_instr
alamb Jul 21, 2025
ebb60dc
add regexp_instr
alamb Jul 21, 2025
4b2a29b
Adjust date
alamb Jul 21, 2025
15919bc
Small updates and typo fixes.
Omega359 Jul 22, 2025
aaa202f
Wordsmith / OCD obeses
alamb Jul 23, 2025
be89378
Gemini AI wordsmith / spelling / style
alamb Jul 23, 2025
0ce255f
update performance
alamb Jul 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Small updates and typo fixes.
  • Loading branch information
Omega359 committed Jul 22, 2025
commit 15919bcf4ebb5cd085501dd033b288c3532af42b
39 changes: 23 additions & 16 deletions content/blog/2025-07-28-datafusion-49.0.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,25 +61,28 @@ With DataFusion, we can all build on top of a shared foundation, and focus on
what makes our projects unique.

<!--
$ git log --pretty=oneline 48.0.0..49.0.0 . | wc -l
1532 (fixme)
# number of commits
$ git log --pretty=oneline 46.0.0..49.0.0 . | wc -l
$ git log --pretty=oneline --since=2025-03-07 --until=2025-07-25 . | wc -l
843

$ git shortlog -sn 48.0.0..49.0.0 . | wc -l
206 (fixme)
# Unique committers in this time 167
$ git shortlog -sn 46.0.0..49.0.0 . | wc -l
$ git shortlog -sn --since=2025-03-07 --until=2025-07-25 . |wc -l

https://crates.io/crates/datafusion/49.0.0
DataFusion 49 released July 25, 2025

https://crates.io/crates/datafusion/46.0.0
DataFusion 46 released March 7, 2025

Issues created in this time: 271 open, 320 closed
Issues created in this time: 274 open, 377 closed
https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2025-03-07..2025-07-25

Issues closed: 440
Issues closed: 504
https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2025-03-07..2025-07-25

PRs merged in this time 751
PRs merged in this time 858
https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2025-03-07..2025-07-25

-->
Expand All @@ -95,7 +98,8 @@ DataFusion continues to focus on enhancing performance as can be seen in the Cli
alt="ClickBench performance results over time for DataFusion"
/>

**Figure 1**: ClickBench performance improved XXX between DataFusion 48.0.0 and DataFusion 49.0.0. Chart source: [DataFusion Benchmarking Page](https://alamb.github.io/datafusion-benchmarking/). TODO DOUBLE CHECK THIS AND RERUN THE NUMBERS
**Figure 1**: ClickBench performance improved XXX between DataFusion 48.0.0 and DataFusion 49.0.0.
Chart source: [DataFusion Benchmarking Page](https://alamb.github.io/datafusion-benchmarking/). TODO DOUBLE CHECK THIS AND RERUN THE NUMBERS


<img
Expand All @@ -110,15 +114,17 @@ alt="Planning benchmark performance results over time for DataFusion"

Here are some noteworthy optimizations that contributed to this improvement since DataFusion 48 was released:

**Equivalence system upgrade:** The lower levels of the equivalence system, which is part of implementing the optimizations described in the [Using Ordering for Better Plans Blog Post], was rewritten, leading to
much faster planning times, especially those with a [large number of columns](https://github.com/apache/datafusion/pull/16217#pullrequestreview-2891941229). This change also prepares the way for more sophisticated sort based optimizations in the future.
(PR [#16217](https://github.com/apache/datafusion/pull/16217) by [ozankabak](https://github.com/ozankabak)).
**Equivalence system upgrade:** The lower levels of the equivalence system, which is part of implementing the
optimizations described in the [Using Ordering for Better Plans Blog Post], was rewritten, leading to
much faster planning times, especially those with a [large number of columns](https://github.com/apache/datafusion/pull/16217#pullrequestreview-2891941229). This change also prepares
the way for more sophisticated sort based optimizations in the future. (PR [#16217](https://github.com/apache/datafusion/pull/16217) by [ozankabak](https://github.com/ozankabak)).

[Using Ordering for Better Plans Blog Post]: https://datafusion.apache.org/blog/2025/03/11/ordering-analysis

**Dynamic Filters and TopK pushdown:**

DataFusion `49.0.0` includes support for dynamic filters and physical filter pushdown. that improves the performance of queries that use `LIMIT` and `ORDER BY` clauses such as the following
DataFusion `49.0.0` includes support for dynamic filters and physical filter pushdown that improves the performance of
queries that use `LIMIT` and `ORDER BY` clauses such as the following

```sql
SELECT *
Expand Down Expand Up @@ -150,7 +156,7 @@ We [plan to write a blog post] explaining the details of this optimization in th

## Community Growth 📈

DataFusion is a community endeavor, and the last few months (between `46.0.0` and `49.0.0`) as seen our community grow:
DataFusion is a community endeavor, and the last few months (between `46.0.0` and `49.0.0`) has seen our community grow:

1. We added several new PMC members and committers: [berkay], [xudong963] and [timsaucer] joined the PMC,
[blaginin], [milenkovicm], [adriangb] and [kosiew] joined as committers. See the [mailing list] for more details.
Expand Down Expand Up @@ -244,7 +250,7 @@ impl AsyncScalarUDFImpl for AskLLM {

### Better cancellation support for long-running queries

In rare cases, it was not possible to cancel certain log running queries, which
In rare cases, it was not possible to cancel certain long running queries which
could lead to unresponsiveness. In other projects this would likely have been
fixed as a small local change as the full solution requires a deep understanding
of the DataFusion execution engine and the tokio execution model. The [PR that
Expand All @@ -263,7 +269,8 @@ Thanks to [pepijnve] for this contribution.

### Laying the foundation `Variant`, `Geometry` and other user defined types

User defined types have been a long requested feature in DataFusion, and we have made significant progress towards this goal in this release. The following features have been added:
User defined types have been a long requested feature in DataFusion and we have made
significant progress towards this goal in this release. The following features have been added:
* metadata handling
* pushdown of filters and expressions

Expand Down Expand Up @@ -341,7 +348,7 @@ by [ding-young](https://github.com/ding-young))

### Support for `REGEX_INSTR` function

DataFusion now supports the [`REGEXP_INSTR` function] function, which returns the position of a
DataFusion now supports the [`REGEXP_INSTR` function] which returns the position of a
regular expression match within a string. This function is useful for extracting
the position of a match in a string.

Expand Down