Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
State schema tws #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
State schema tws #12
Changes from 1 commit
82b4ad29491292b7d9c3187b0f59d81b1e38911d59201df0d24bce72d9394ee190166961fd9363857a9dec6db635a2f3743fe6abd1e4750e53d65fd452c1b6df4156a583ab05224ba16aad6771610783682a84ed72df3cb334816a8870efcda81d8ea3625a9b5e1b792d0b122d1d29c90bbd049c059c843988548fd045c9ea2bca778fd4e3b8c7aeebdcb79f08e741bbe154a370bdcc90b214f175fff90157b1e33831886878de00dd8b05f2d2bedfaa4bfb08ee8aba0775ea7347f9c6c09039a9881e0a8f662fc33a9c5d257a78842cd9610a4b11271475f70c1662490d302ad3da240e00d26fd3455df9ef092f8fdd85f66d8a290864bbe00a96bbd8a24b7042804af0b7cfae265c60738acd105c87e5c5809b6a3feffd9898e9d58701d880bba4447ffe406ee7c255e28e95878dd6ab77caf71e868b2d067fc6b0e2cb52fe06923ac31b1484e7ac54587635d6e9dd248fd4c9eadb2c0d9f8a1692d869955349f714699b904d4ddb0b02b2e68b8ca6eb7978d8099a267c7187f07775962bad53b99bb003469ec6cd8bf11fdabe08b5d0d0797d3add32861e0f0563ef0bc38ac94142117e5a461b1677a4c8d75c184d278c4b37eb8e972dae88cc15331fa9d84663b84a7dc0208b16196e45967409cb5928e02a64fb5697d2ac27105112e58d47f34fb49479b51f11038c4ca7e068be4b59289089d4abafebacb91c459afb169346c07cbba6ee3a612bb21861ec0ee867a1608b0fc5b0b4cf5450313479ce23d69ba474b88a50b30d7f5f96cb47c614c238e70acd8504de30c7ae9d4fcc998a019062f955723f23c32e73d01581264906af78b5f76b114272e8ec7dde7ea0cd017c7c19648f39b82e31572b15462358d1a89ff2d1775943905642c4bb5b53c6cc788d127bf9119d89aad3b11608cb5a55e440ad829baf461bdf13ca01cdd5fa2c9eb1c63044848cd095f3bf7de02eeebef9141aa4a2c4be069f3a9bfc98ccd80277ee4e57f066bfeb09f49418b0487d78a84a6a4f70ce13399980ebc16b24cd1f1af48eb4d5703b076bab129d9b944395c29d8d5ac7c9b6768eeaafb5e39d515740a1288a4a67058593042238a5f4e04ee37ed353c2daf471bfedb9e1ac49fece87ad352fee0d306efe2e743ea5d295eee250dce99689b1a7b2f2857b94337016dfb122f21af2473a36d061b6ea1a1df1e5d25b7b80ede31360a7945ec1a041dfeb4c01c24490ab250aa4File filter
Filter by extension
Conversations
Uh oh!
There was an error while loading. Please reload this page.
Jump to
Uh oh!
There was an error while loading. Please reload this page.
…hIntervalType` in `df.collect` ### What changes were proposed in this pull request? Refine the error message for `YearMonthIntervalType` in `df.collect` ### Why are the changes needed? for better understanding ### Does this PR introduce _any_ user-facing change? yes before: ``` In [1]: spark.sql("SELECT INTERVAL '10-8' YEAR TO MONTH AS interval").first() [********************************************************************************] 100.00% Complete (0 Tasks running, 0[********************************************************************************] 100.00% Complete (0 Tasks running, 0[********************************************************************************] 100.00% Complete (0 Tasks running, 0 --------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[1], line 1 ----> 1 spark.sql("SELECT INTERVAL '10-8' YEAR TO MONTH AS interval").first() File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:522, in DataFrame.first(self) 521 def first(self) -> Optional[Row]: --> 522 return self.head() File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:666, in DataFrame.head(self, n) 664 def head(self, n: Optional[int] = None) -> Union[Optional[Row], List[Row]]: 665 if n is None: --> 666 rs = self.head(1) 667 return rs[0] if rs else None 668 return self.take(n) File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:668, in DataFrame.head(self, n) 666 rs = self.head(1) 667 return rs[0] if rs else None --> 668 return self.take(n) File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:671, in DataFrame.take(self, num) 670 def take(self, num: int) -> List[Row]: --> 671 return self.limit(num).collect() File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:1835, in DataFrame.collect(self) 1831 schema = schema or from_arrow_schema(table.schema, prefer_timestamp_ntz=True) 1833 assert schema is not None and isinstance(schema, StructType) -> 1835 return ArrowTableToRowsConversion.convert(table, schema) File ~/Dev/spark/python/pyspark/sql/connect/conversion.py:542, in ArrowTableToRowsConversion.convert(table, schema) 536 assert schema is not None and isinstance(schema, StructType) 538 field_converters = [ 539 ArrowTableToRowsConversion._create_converter(f.dataType) for f in schema.fields 540 ] --> 542 columnar_data = [column.to_pylist() for column in table.columns] 544 rows: List[Row] = [] 545 for i in range(0, table.num_rows): File ~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/pyarrow/table.pxi:1327, in pyarrow.lib.ChunkedArray.to_pylist() File ~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/pyarrow/table.pxi:1256, in pyarrow.lib.ChunkedArray.chunk() File ~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/pyarrow/public-api.pxi:208, in pyarrow.lib.pyarrow_wrap_array() File ~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/pyarrow/array.pxi:3711, in pyarrow.lib.get_array_class_from_type() KeyError: 21 ``` after: ``` In [2]: spark.sql("SELECT INTERVAL '10-8' YEAR TO MONTH AS interval").first() [********************************************************************************] 100.00% Complete (0 Tasks running, 0[********************************************************************************] 100.00% Complete (0 Tasks running, 0 --------------------------------------------------------------------------- PySparkTypeError Traceback (most recent call last) Cell In[2], line 1 ----> 1 spark.sql("SELECT INTERVAL '10-8' YEAR TO MONTH AS interval").first() File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:522, in DataFrame.first(self) 521 def first(self) -> Optional[Row]: --> 522 return self.head() File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:666, in DataFrame.head(self, n) 664 def head(self, n: Optional[int] = None) -> Union[Optional[Row], List[Row]]: 665 if n is None: --> 666 rs = self.head(1) 667 return rs[0] if rs else None 668 return self.take(n) File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:668, in DataFrame.head(self, n) 666 rs = self.head(1) 667 return rs[0] if rs else None --> 668 return self.take(n) File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:671, in DataFrame.take(self, num) 670 def take(self, num: int) -> List[Row]: --> 671 return self.limit(num).collect() File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:1833, in DataFrame.collect(self) 1829 table, schema = self._to_table() 1831 # not all datatypes are supported in arrow based collect 1832 # here always verify the schema by from_arrow_schema -> 1833 schema2 = from_arrow_schema(table.schema, prefer_timestamp_ntz=True) 1834 schema = schema or schema2 1836 assert schema is not None and isinstance(schema, StructType) File ~/Dev/spark/python/pyspark/sql/pandas/types.py:306, in from_arrow_schema(arrow_schema, prefer_timestamp_ntz) 300 def from_arrow_schema(arrow_schema: "pa.Schema", prefer_timestamp_ntz: bool = False) -> StructType: 301 """Convert schema from Arrow to Spark.""" 302 return StructType( 303 [ 304 StructField( 305 field.name, --> 306 from_arrow_type(field.type, prefer_timestamp_ntz), 307 nullable=field.nullable, 308 ) 309 for field in arrow_schema 310 ] 311 ) File ~/Dev/spark/python/pyspark/sql/pandas/types.py:293, in from_arrow_type(at, prefer_timestamp_ntz) 291 spark_type = NullType() 292 else: --> 293 raise PySparkTypeError( 294 error_class="UNSUPPORTED_DATA_TYPE_FOR_ARROW_CONVERSION", 295 message_parameters={"data_type": str(at)}, 296 ) 297 return spark_type PySparkTypeError: [UNSUPPORTED_DATA_TYPE_FOR_ARROW_CONVERSION] month_interval is not supported in conversion to Arrow. ``` ### How was this patch tested? added test ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47004 from zhengruifeng/collect_ym_error. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>Uh oh!
There was an error while loading. Please reload this page.
There are no files selected for viewing