-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23699][PYTHON][SQL] Raise same type of error caught with Arrow enabled #20839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23699][PYTHON][SQL] Raise same type of error caught with Arrow enabled #20839
Conversation
|
Test build #88281 has finished for PR 20839 at commit
|
|
@HyukjinKwon, @ueshin please take a look when you can |
python/pyspark/sql/session.py
Outdated
| warnings.warn(msg) | ||
| else: | ||
| msg = ( | ||
| e.message = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler, I think message attribute is only in Python 2. Also, are you doubly sure that this wraps the exception message in console too .. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, you're right. The tests check the exception message so I thought all was good. Let me try something else.
Before with Python 2, missing traceback and doesn't show as
|
|
Test build #88487 has finished for PR 20839 at commit
|
|
Test build #88488 has finished for PR 20839 at commit
|
|
@HyukjinKwon and @ueshin I came close to a solution that would wrap the exception message like before but still keep the exception type and traceback. The problem was there are too many variations in python exceptions and it was getting too complicated to be foolproof. For instance, To keep things simple for this, I thought it would be fine when fallback is disabled, to still print a warning message to indicate the related conf, and then use |
|
Yea, I agree that the warn way is the min fix. I actually attempted a try to fix this locally before and failed to make a clean fix, and I understand the complexity here, in particular, because of compatibility between Python 2 and Python 3. I also took a look for Also, I took a look for the complexity you mentioned when I initially proposed the changes in this logic. Looked not quite easy to wrap it manually too. Only one main concern is, if it makes sense to call it a warn because the message contains, for example, the warning message would contain: ...
"'spark.sql.execution.arrow.enabled' is set to true; however, "
"failed unexpectedly:\n %s\n"
... |
| have_pandas = True | ||
| except ImportError: | ||
| have_pandas = False | ||
| if not have_pandas: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is for making the error message clean when it's reraised in Python 3? I think it's fine to leave it as was. I believe the downside of it is to lose the information where exactly the error was thrown in Python 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the traceback to the raise ImportError below is all the information needed. If that happens, then the only possible cause is that the import failed from here. The problem with how it was before is that for Python 3, it will print out During handling of the above exception, another exception occurred: which makes it seem like it is not being handled correctly, since it's really just a failed import.
|
For the best and the min fix, can we rephrase the warning messages to make them look more like a warn not an error? |
|
Thanks @HyukjinKwon for reviewing! I agree with what you said about the rephrasing the warning message, I'll try to make that sound better. |
This reverts commit 17dd605.
After Reworded Warnings |
|
Test build #88552 has finished for PR 20839 at commit
|
|
Test build #88554 has finished for PR 20839 at commit
|
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm I don't have a better idea for now
|
Thanks @HyukjinKwon ! I'll merge later today if no more comments. |
|
merged to master |
… enabled ## What changes were proposed in this pull request? When using Arrow for createDataFrame or toPandas and an error is encountered with fallback disabled, this will raise the same type of error instead of a RuntimeError. This change also allows for the traceback of the error to be retained and prevents the accidental chaining of exceptions with Python 3. ## How was this patch tested? Updated existing tests to verify error type. Author: Bryan Cutler <[email protected]> Closes apache#20839 from BryanCutler/arrow-raise-same-error-SPARK-23699.
What changes were proposed in this pull request?
When using Arrow for createDataFrame or toPandas and an error is encountered with fallback disabled, this will raise the same type of error instead of a RuntimeError. This change also allows for the traceback of the error to be retained and prevents the accidental chaining of exceptions with Python 3.
How was this patch tested?
Updated existing tests to verify error type.