Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
State schema tws #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
State schema tws #12
Changes from 1 commit
82b4ad29491292b7d9c3187b0f59d81b1e38911d59201df0d24bce72d9394ee190166961fd9363857a9dec6db635a2f3743fe6abd1e4750e53d65fd452c1b6df4156a583ab05224ba16aad6771610783682a84ed72df3cb334816a8870efcda81d8ea3625a9b5e1b792d0b122d1d29c90bbd049c059c843988548fd045c9ea2bca778fd4e3b8c7aeebdcb79f08e741bbe154a370bdcc90b214f175fff90157b1e33831886878de00dd8b05f2d2bedfaa4bfb08ee8aba0775ea7347f9c6c09039a9881e0a8f662fc33a9c5d257a78842cd9610a4b11271475f70c1662490d302ad3da240e00d26fd3455df9ef092f8fdd85f66d8a290864bbe00a96bbd8a24b7042804af0b7cfae265c60738acd105c87e5c5809b6a3feffd9898e9d58701d880bba4447ffe406ee7c255e28e95878dd6ab77caf71e868b2d067fc6b0e2cb52fe06923ac31b1484e7ac54587635d6e9dd248fd4c9eadb2c0d9f8a1692d869955349f714699b904d4ddb0b02b2e68b8ca6eb7978d8099a267c7187f07775962bad53b99bb003469ec6cd8bf11fdabe08b5d0d0797d3add32861e0f0563ef0bc38ac94142117e5a461b1677a4c8d75c184d278c4b37eb8e972dae88cc15331fa9d84663b84a7dc0208b16196e45967409cb5928e02a64fb5697d2ac27105112e58d47f34fb49479b51f11038c4ca7e068be4b59289089d4abafebacb91c459afb169346c07cbba6ee3a612bb21861ec0ee867a1608b0fc5b0b4cf5450313479ce23d69ba474b88a50b30d7f5f96cb47c614c238e70acd8504de30c7ae9d4fcc998a019062f955723f23c32e73d01581264906af78b5f76b114272e8ec7dde7ea0cd017c7c19648f39b82e31572b15462358d1a89ff2d1775943905642c4bb5b53c6cc788d127bf9119d89aad3b11608cb5a55e440ad829baf461bdf13ca01cdd5fa2c9eb1c63044848cd095f3bf7de02eeebef9141aa4a2c4be069f3a9bfc98ccd80277ee4e57f066bfeb09f49418b0487d78a84a6a4f70ce13399980ebc16b24cd1f1af48eb4d5703b076bab129d9b944395c29d8d5ac7c9b6768eeaafb5e39d515740a1288a4a67058593042238a5f4e04ee37ed353c2daf471bfedb9e1ac49fece87ad352fee0d306efe2e743ea5d295eee250dce99689b1a7b2f2857b94337016dfb122f21af2473a36d061b6ea1a1df1e5d25b7b80ede31360a7945ec1a041dfeb4c01c24490ab250aa4File filter
Filter by extension
Conversations
Uh oh!
There was an error while loading. Please reload this page.
Jump to
Uh oh!
There was an error while loading. Please reload this page.
PandasNotImplementedErrorfor unsu……pported plotting functions ### What changes were proposed in this pull request? Throw `PandasNotImplementedError` for unsupported plotting functions: - {Frame, Series}.plot.hist - {Frame, Series}.plot.kde - {Frame, Series}.plot.density - {Frame, Series}.plot(kind="hist", ...) - {Frame, Series}.plot(kind="hist", ...) - {Frame, Series}.plot(kind="density", ...) ### Why are the changes needed? the previous error message is confusing: ``` In [3]: psdf.plot.hist() /Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1017: PandasAPIOnSparkAdviceWarning: The config 'spark.sql.ansi.enabled' is set to True. This can cause unexpected behavior from pandas API on Spark since pandas API on Spark follows the behavior of pandas, not SQL. warnings.warn(message, PandasAPIOnSparkAdviceWarning) [*********************************************-----------------------------------] 57.14% Complete (0 Tasks running, 1s, Scanned[*********************************************-----------------------------------] 57.14% Complete (0 Tasks running, 1s, Scanned[*********************************************-----------------------------------] 57.14% Complete (0 Tasks running, 1s, Scanned --------------------------------------------------------------------------- PySparkAttributeError Traceback (most recent call last) Cell In[3], line 1 ----> 1 psdf.plot.hist() File ~/Dev/spark/python/pyspark/pandas/plot/core.py:951, in PandasOnSparkPlotAccessor.hist(self, bins, **kwds) 903 def hist(self, bins=10, **kwds): 904 """ 905 Draw one histogram of the DataFrame’s columns. 906 A `histogram`_ is a representation of the distribution of data. (...) 949 >>> df.plot.hist(bins=12, alpha=0.5) # doctest: +SKIP 950 """ --> 951 return self(kind="hist", bins=bins, **kwds) File ~/Dev/spark/python/pyspark/pandas/plot/core.py:580, in PandasOnSparkPlotAccessor.__call__(self, kind, backend, **kwargs) 577 kind = {"density": "kde"}.get(kind, kind) 578 if hasattr(plot_backend, "plot_pandas_on_spark"): 579 # use if there's pandas-on-Spark specific method. --> 580 return plot_backend.plot_pandas_on_spark(plot_data, kind=kind, **kwargs) 581 else: 582 # fallback to use pandas' 583 if not PandasOnSparkPlotAccessor.pandas_plot_data_map[kind]: File ~/Dev/spark/python/pyspark/pandas/plot/plotly.py:41, in plot_pandas_on_spark(data, kind, **kwargs) 39 return plot_pie(data, **kwargs) 40 if kind == "hist": ---> 41 return plot_histogram(data, **kwargs) 42 if kind == "box": 43 return plot_box(data, **kwargs) File ~/Dev/spark/python/pyspark/pandas/plot/plotly.py:87, in plot_histogram(data, **kwargs) 85 psdf, bins = HistogramPlotBase.prepare_hist_data(data, bins) 86 assert len(bins) > 2, "the number of buckets must be higher than 2." ---> 87 output_series = HistogramPlotBase.compute_hist(psdf, bins) 88 prev = float("%.9f" % bins[0]) # to make it prettier, truncate. 89 text_bins = [] File ~/Dev/spark/python/pyspark/pandas/plot/core.py:189, in HistogramPlotBase.compute_hist(psdf, bins) 183 for group_id, (colname, bucket_name) in enumerate(zip(colnames, bucket_names)): 184 # creates a Bucketizer to get corresponding bin of each value 185 bucketizer = Bucketizer( 186 splits=bins, inputCol=colname, outputCol=bucket_name, handleInvalid="skip" 187 ) --> 189 bucket_df = bucketizer.transform(sdf) 191 if output_df is None: 192 output_df = bucket_df.select( 193 F.lit(group_id).alias("__group_id"), F.col(bucket_name).alias("__bucket") 194 ) File ~/Dev/spark/python/pyspark/ml/base.py:260, in Transformer.transform(self, dataset, params) 258 return self.copy(params)._transform(dataset) 259 else: --> 260 return self._transform(dataset) 261 else: 262 raise TypeError("Params must be a param map but got %s." % type(params)) File ~/Dev/spark/python/pyspark/ml/wrapper.py:412, in JavaTransformer._transform(self, dataset) 409 assert self._java_obj is not None 411 self._transfer_params_to_java() --> 412 return DataFrame(self._java_obj.transform(dataset._jdf), dataset.sparkSession) File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:1696, in DataFrame.__getattr__(self, name) 1694 def __getattr__(self, name: str) -> "Column": 1695 if name in ["_jseq", "_jdf", "_jmap", "_jcols", "rdd", "toJSON"]: -> 1696 raise PySparkAttributeError( 1697 error_class="JVM_ATTRIBUTE_NOT_SUPPORTED", message_parameters={"attr_name": name} 1698 ) 1700 if name not in self.columns: 1701 raise PySparkAttributeError( 1702 error_class="ATTRIBUTE_NOT_SUPPORTED", message_parameters={"attr_name": name} 1703 ) PySparkAttributeError: [JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jdf` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail. ``` after this PR: ``` In [3]: psdf.plot.hist() --------------------------------------------------------------------------- PandasNotImplementedError Traceback (most recent call last) Cell In[3], line 1 ----> 1 psdf.plot.hist() File ~/Dev/spark/python/pyspark/pandas/plot/core.py:957, in PandasOnSparkPlotAccessor.hist(self, bins, **kwds) 909 """ 910 Draw one histogram of the DataFrame’s columns. 911 A `histogram`_ is a representation of the distribution of data. (...) 954 >>> df.plot.hist(bins=12, alpha=0.5) # doctest: +SKIP 955 """ 956 if is_remote(): --> 957 return unsupported_function(class_name="pd.DataFrame", method_name="hist")() 959 return self(kind="hist", bins=bins, **kwds) File ~/Dev/spark/python/pyspark/pandas/missing/__init__.py:23, in unsupported_function.<locals>.unsupported_function(*args, **kwargs) 22 def unsupported_function(*args, **kwargs): ---> 23 raise PandasNotImplementedError( 24 class_name=class_name, method_name=method_name, reason=reason 25 ) PandasNotImplementedError: The method `pd.DataFrame.hist()` is not implemented yet. ``` ### Does this PR introduce _any_ user-facing change? yes, error message improvement ### How was this patch tested? CI ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46911 from zhengruifeng/ps_plotting_unsupported. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>Uh oh!
There was an error while loading. Please reload this page.
There are no files selected for viewing