-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28178][SQL] DataSourceV2: DataFrameWriter.insertInfo #24980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #106958 has finished for PR 24980 at commit
|
|
Test build #106971 has finished for PR 24980 at commit
|
|
Test build #106972 has finished for PR 24980 at commit
|
|
Test build #107140 has finished for PR 24980 at commit
|
|
Test build #107143 has finished for PR 24980 at commit
|
|
Retest this please. |
|
Test build #107313 has finished for PR 24980 at commit
|
|
Test build #108235 has finished for PR 24980 at commit
|
|
@dongjoon-hyun @brkyvz @cloud-fan @rdblue This PR is ready for review. It is a follow-up to DSv2 INSERT INTO. |
|
|
||
| assertNotBucketed("insertInto") | ||
|
|
||
| if (partitioningColumns.isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we move these 2 checks to the public insertTo method, instead of duplicating it in the 2 private methods?
| } | ||
| } | ||
|
|
||
| test("insertInto: append partitioned table - dynamic clause") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by dynamic clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a copy-paste issue. I will remove " - dynamic clause" from the title. insertInto does not have anything similar to INSERT INTO's PARTITION clause.
|
Test build #108338 has finished for PR 24980 at commit
|
|
thanks, merging to master! |
|
Thanks @cloud-fan ! |
|
|
||
| val command = modeForDSV2 match { | ||
| case SaveMode.Append => | ||
| AppendData.byName(table, df.logicalPlan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed it. If you look at the doc of insertInto, it says
* Inserts the content of the `DataFrame` to the specified table. It requires that
* the schema of the `DataFrame` is the same as the schema of the table.
*
* @note Unlike `saveAsTable`, `insertInto` ignores the column names and just uses position-based
* resolution. For example:
We should use byPosition here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This is an oversight and should be by position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cloud-fan @rdblue. I will submit a hotfix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can create a follow-up PR to introduce an option matchByName, default to false. If true, insertInto uses byName; otherwise, byPosition.
Maybe even included in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to create a new dataframe writer API in the future, I'd like to keep it as it is, and always do by-position in this insertInto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to create a new dataframe writer API in the future, I'd like to keep it as it is, and always do by-position in this insertInto.
Sounds good to me. I'll submit a PR for the new API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue @cloud-fan @dongjoon-hyun #25353 is ready for review.
What changes were proposed in this pull request?
Support multiple catalogs in the following InsertInto use cases:
Support matrix:
How was this patch tested?
New tests.
All existing catalyst and sql/core tests.