Skip to content

Conversation

@stefankandic
Copy link
Contributor

@stefankandic stefankandic commented Jun 21, 2024

What changes were proposed in this pull request?

Adding a new classes of filters which are collation aware.

Why are the changes needed?

#46760 Added the logic of predicate widening for collated column references, but this would completely change the filters and if the original expression did not get evaluated by spark later we could end up with wrong results. Also, data sources would never be able to actually support these filters and they would just see them as AlwaysTrue.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jun 21, 2024
@stefankandic stefankandic changed the title [WIP] Fix collation predicate widening [SPARK-48697] Fix collation predicate widening Jun 24, 2024
@stefankandic stefankandic marked this pull request as ready for review June 24, 2024 11:54
@cloud-fan
Copy link
Contributor

I think there is some misunderstanding here. Filter pushdown has a few steps:

  1. Spark translates catalyst filters to data source filters, which can be a semantically subset as some catalyst filters do not have corresponding data source filters.
  2. Spark pushes down the data source filters to data source implementation.
  3. Data source implementation tells Spark which filters need to be evaluated again at Spark side. See DS v2 SupportsPushDownFilters.pushFilters, which returns to-be-evaluated-by-Spark filters.

I don't get why we need TranslatedFilter, as the problem is not from the translation layer.

@stefankandic stefankandic force-pushed the fixPredicateWidening branch from f6c6bd7 to a4458b6 Compare June 27, 2024 12:03
@stefankandic stefankandic changed the title [SPARK-48697] Fix collation predicate widening [SPARK-48697] Properly translate filters on collated columns Jun 27, 2024
@stefankandic stefankandic changed the title [SPARK-48697] Properly translate filters on collated columns [SPARK-48697] Add collation aware string filters Jun 27, 2024
@stefankandic
Copy link
Contributor Author

@cloud-fan I made some changes per our discussion, let me know what you think

@stefankandic stefankandic changed the title [SPARK-48697] Add collation aware string filters [SPARK-48697][SQL] Add collation aware string filters Jun 27, 2024
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 703b076 Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants