-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18856][SQL] non-empty partitioned table should not report zero size #16280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| className = table.provider.get, | ||
| options = table.storage.properties ++ pathOption) | ||
| options = table.storage.properties ++ pathOption, | ||
| catalogTable = Some(simpleCatalogRelation.metadata)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we will always use InMemoryFileIndex because we don't pass the catalog table to DataSource...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way to catch this in a test? Presumably this would have caused all the files to be scanned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i know why. The InMemoryExternalCatalog hasn't implemented all the interfaces(e.g. some partition related ones), so we did it intentionally.
|
Test build #70130 has finished for PR 16280 at commit
|
|
Test build #70167 has finished for PR 16280 at commit
|
|
Merging in master/branch-2.1. |
… size ## What changes were proposed in this pull request? In `DataSource`, if the table is not analyzed, we will use 0 as the default value for table size. This is dangerous, we may broadcast a large table and cause OOM. We should use `defaultSizeInBytes` instead. ## How was this patch tested? new regression test Author: Wenchen Fan <[email protected]> Closes #16280 from cloud-fan/bug. (cherry picked from commit d6f11a1) Signed-off-by: Reynold Xin <[email protected]>
… size ## What changes were proposed in this pull request? In `DataSource`, if the table is not analyzed, we will use 0 as the default value for table size. This is dangerous, we may broadcast a large table and cause OOM. We should use `defaultSizeInBytes` instead. ## How was this patch tested? new regression test Author: Wenchen Fan <[email protected]> Closes apache#16280 from cloud-fan/bug.
… size ## What changes were proposed in this pull request? In `DataSource`, if the table is not analyzed, we will use 0 as the default value for table size. This is dangerous, we may broadcast a large table and cause OOM. We should use `defaultSizeInBytes` instead. ## How was this patch tested? new regression test Author: Wenchen Fan <[email protected]> Closes apache#16280 from cloud-fan/bug.
What changes were proposed in this pull request?
In
DataSource, if the table is not analyzed, we will use 0 as the default value for table size. This is dangerous, we may broadcast a large table and cause OOM. We should usedefaultSizeInBytesinstead.How was this patch tested?
new regression test