Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Jul 1, 2024

What changes were proposed in this pull request?

Document the behavior difference of extraction between element_at and try_element_at

Why are the changes needed?

when the function try_element_at was introduced in 3.5, its extraction handling was unintentionally not consistent with the element_at, which causes confusion.

This PR document this behavior difference (I don't think we can fix it since it will be a breaking change).

In [1]: from pyspark.sql import functions as sf

In [2]: df = spark.createDataFrame([({"a": 1.0, "b": 2.0}, "a")], ['data', 'b'])

In [3]: df.select(sf.try_element_at(df.data, 'b')).show()
+-----------------------+
|try_element_at(data, b)|
+-----------------------+
|                    1.0|
+-----------------------+


In [4]: df.select(sf.element_at(df.data, 'b')).show()
+-------------------+
|element_at(data, b)|
+-------------------+
|                2.0|
+-------------------+

Does this PR introduce any user-facing change?

doc changes

How was this patch tested?

ci, added doctests

Was this patch authored or co-authored using generative AI tooling?

no

@zhengruifeng
Copy link
Contributor Author

typo

nit
@zhengruifeng zhengruifeng force-pushed the doc_element_at_extraction branch from 11618a2 to 8365824 Compare July 1, 2024 07:41
Comment on lines +14101 to +14102
If extraction is a string, :meth:`element_at` treats it as a literal string,
while :meth:`try_element_at` treats it as a column name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this behavior difference intentional? Is it consistent with the SQL functions element_at and try_element_at?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SQL side also treat it as a literal:

In [2]: spark.sql("SELECT ELEMENT_AT(MAP('a', 'b'), 'a')").show()
+------------------------+
|element_at(map(a, b), a)|
+------------------------+
|                       b|
+------------------------+

@zhengruifeng
Copy link
Contributor Author

thanks all, merged to master

@zhengruifeng zhengruifeng deleted the doc_element_at_extraction branch July 1, 2024 23:41
@panbingkun
Copy link
Contributor

Late LGTM


>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([({"a": 1.0, "b": 2.0}, "a")], ['data', 'b'])
>>> df.select(sf.try_element_at(df.data, 'b')).show()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add another example below this?

df.select(sf.try_element_at(df.data, df.b)).show()

Because of the grammar scene above, I took a long time to understand its intended meaning, which is so obscure, 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants