-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29014][SQL] DataSourceV2: Fix current/default catalog usage #26120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc: @cloud-fan / @rdblue |
|
I think this should also make the default catalog and the session catalog private in CatalogManager, to ensure that the only catalog accessed from rules is the current catalog. If the default catalog or session catalog is the current catalog, they can (and should) be accessed by getting the current catalog. The default and session catalogs should be internal to CatalogManager. |
OK. There are few places where default catalog / session catalog is accessed directly, for example, |
|
Test build #112078 has finished for PR 26120 at commit
|
sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala
Outdated
Show resolved
Hide resolved
|
Test build #112102 has finished for PR 26120 at commit
|
| */ | ||
| class Analyzer( | ||
| override val catalogManager: CatalogManager, | ||
| v1SessionCatalog: SessionCatalog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed catalog to v1SessionCatalog in Analyzer to be explicit. Please let me know if this is not desired.
| saveAsTable(sessionCatalog.asTableCatalog, ident) | ||
| case CatalogObjectIdentifier(catalog, ident) | ||
| if isSessionCatalog(catalog) && canUseV2 && ident.namespace().length <= 1 => | ||
| saveAsTable(catalog.asTableCatalog, ident) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be correct if the current catalog is v2 session catalog that doesn't delegate to the v1 session catalog? If you look at the previous behavior, it's always using v1 session catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a known problem that if the v2 session catalog doesn't delegate to v1 session catalog, many things can be broken.
I think the previous version was wrong. It always use the default v2 session catalog even if users set a custom v2 session catalog.
|
Test build #112184 has finished for PR 26120 at commit
|
| */ | ||
| class Analyzer( | ||
| override val catalogManager: CatalogManager, | ||
| v1SessionCatalog: SessionCatalog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One downside of this approach (passing SessionCatalog as a separate parameter) is that a SessionCatalog instance can be different from the one stored in CatalogManager. Since CatlalogManager updates the current database of the session catalog, it can be out of sync. Please let me know if this approach is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, what's the upside of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1SessionCatalog is now private in CatalogManager. Since Analyzer uses v1SessionCatalog we need to pass this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to make v1SessionCatalog private in CatalogManager. We only need to make sessionCatalog and defaultCatalog private. cc @rdblue
|
@rdblue / @cloud-fan I applied @rdblue's suggestion to make default and session catalog as private in |
|
Test build #112188 has finished for PR 26120 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala
Show resolved
Hide resolved
|
Test build #112227 has finished for PR 26120 at commit
|
|
+1 Thanks for fixing this! @cloud-fan, any other issues? |
|
Test build #112236 has finished for PR 26120 at commit
|
|
Test build #112243 has finished for PR 26120 at commit
|
|
LGTM if jenkins pass |
|
Test build #112269 has finished for PR 26120 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
The handling of the catalog across plans should be as follows (SPARK-29014):
This PR addresses the issue where current catalog usage is not followed as describe above.
Why are the changes needed?
It is a bug as described in the previous section.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit tests added.