-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton #6532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @JoshRosen @rxin |
|
Does this need to block 1.4? |
|
@rxin I think so, it's a regression. |
|
Test build #33836 has finished for PR 6532 at commit
|
python/pyspark/sql/tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant SPARK-7978
|
I guess the root cause of this bug was that the DecimalType class's singleton-ness was inherited due to our use of metaclasses. In Scala, we didn't have this problem because the singleton-ness was implemented at the leaves of the class hierarchy by using To prevent this sort of issue in the future, I wonder whether it would be clearer to apply a decorator on the leaf class declarations themselves (as in http://elbenshira.com/blog/singleton-pattern-in-python/) rather than using metaclasses. It would be good to understand whether the difference between these approaches has any performance / correctness impacts. In the meantime, the fix here makes sense to me: we're just pushing the singleton-ness a bit deeper into the class hierarchy so that it isn't inappropriately inherited by DecimalType. From a test coverage perspective, what tests would have caught this? It seems that something as simple as just manually constructing an instance of each type with all of its parameters specified would have caught this. Even though it seems like an obvious / dumb test, maybe we should add a test that just checks that we're able to at least instantiate each of the types that take parameters. |
|
Also, I wonder whether |
|
@JoshRosen I think the new added test could coverage that DecimalType is not singleton, by It may make sense to switch other singleton patterns, but I'd like to fix this first without introducing too much other things. |
|
Test build #33848 has finished for PR 6532 at commit
|
|
Maybe another systematic fix is to not allow the singleton-ing of any Type that takes constructor args? At subclass time can you check that? |
|
@davies, I agree that the test you added here acts as a proper regression test. My comment was more to suggest that we could have prevented this regression in the first place with a relatively simple test that just tries to instantiate each data type with all of its constructor arguments. The fact that this bug evaded unit tests implies that our existing unit tests didn't create DecimalTypes with any constructor arguments, implying that our test coverage of decimal-related code might be insufficient. I think that this patch is fine, but for 1.5 we should make a dedicated effort to improve Python's test coverage. @airhorns, do you mean that the single metaclass would act as a no-op when applied to Types that take constructor arguments or that it would throw an exception if applied to those types? This is purely academic at this point, but I can imagine some contrived scenarios where the no-op behavior might be confusing: what if I had a class which accepted constructor parameters, then created a subclass which called its superclass constructor with constant values for those parameters? In this case, the subclass can be a singleton but the superclass can't. To avoid having to reason about these corner-cases, maybe it's better to just accept a bit of verbosity and use decorators instead. We shouldn't do that for this patch, though; we can leave it as a followup for 1.5. |
|
@JoshRosen sounds good to me! |
|
As an experiment, I put together some code to run the PySpark test suite through |
Author: Davies Liu <[email protected]> Closes #6532 from davies/decimal and squashes the following commits: c7fcbce [Davies Liu] Update tests.py 1425359 [Davies Liu] DecimalType should not be singleton (cherry picked from commit 91777a1) Signed-off-by: Reynold Xin <[email protected]>
Author: Davies Liu <[email protected]> Closes apache#6532 from davies/decimal and squashes the following commits: c7fcbce [Davies Liu] Update tests.py 1425359 [Davies Liu] DecimalType should not be singleton
Author: Davies Liu <[email protected]> Closes apache#6532 from davies/decimal and squashes the following commits: c7fcbce [Davies Liu] Update tests.py 1425359 [Davies Liu] DecimalType should not be singleton
No description provided.