[SPARK-18856][SQL] non-empty partitioned table should not report zero size #16280

cloud-fan · 2016-12-14T12:42:50Z

What changes were proposed in this pull request?

In DataSource, if the table is not analyzed, we will use 0 as the default value for table size. This is dangerous, we may broadcast a large table and cause OOM. We should use defaultSizeInBytes instead.

How was this patch tested?

new regression test

cloud-fan · 2016-12-14T12:48:54Z

cc @rxin @ericl

cloud-fan · 2016-12-14T12:50:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

        className = table.provider.get,
-        options = table.storage.properties ++ pathOption)
+        options = table.storage.properties ++ pathOption,
+        catalogTable = Some(simpleCatalogRelation.metadata))


Previously we will always use InMemoryFileIndex because we don't pass the catalog table to DataSource...

Any way to catch this in a test? Presumably this would have caused all the files to be scanned.

oh i know why. The InMemoryExternalCatalog hasn't implemented all the interfaces(e.g. some partition related ones), so we did it intentionally.

SparkQA · 2016-12-14T14:24:35Z

Test build #70130 has finished for PR 16280 at commit 4939d6d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-15T03:56:36Z

Test build #70167 has finished for PR 16280 at commit 1628b29.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-12-15T05:02:59Z

Merging in master/branch-2.1.

… size ## What changes were proposed in this pull request? In `DataSource`, if the table is not analyzed, we will use 0 as the default value for table size. This is dangerous, we may broadcast a large table and cause OOM. We should use `defaultSizeInBytes` instead. ## How was this patch tested? new regression test Author: Wenchen Fan <[email protected]> Closes #16280 from cloud-fan/bug. (cherry picked from commit d6f11a1) Signed-off-by: Reynold Xin <[email protected]>

… size ## What changes were proposed in this pull request? In `DataSource`, if the table is not analyzed, we will use 0 as the default value for table size. This is dangerous, we may broadcast a large table and cause OOM. We should use `defaultSizeInBytes` instead. ## How was this patch tested? new regression test Author: Wenchen Fan <[email protected]> Closes apache#16280 from cloud-fan/bug.

non-empty partitioned table should not report zero size

4939d6d

cloud-fan commented Dec 14, 2016

View reviewed changes

ericl approved these changes Dec 14, 2016

View reviewed changes

revert something

1628b29

asfgit closed this in d6f11a1 Dec 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-18856][SQL] non-empty partitioned table should not report zero size #16280

[SPARK-18856][SQL] non-empty partitioned table should not report zero size #16280

Uh oh!

cloud-fan commented Dec 14, 2016

Uh oh!

cloud-fan commented Dec 14, 2016

Uh oh!

cloud-fan Dec 14, 2016

Uh oh!

ericl Dec 14, 2016

Uh oh!

cloud-fan Dec 15, 2016

Uh oh!

SparkQA commented Dec 14, 2016

Uh oh!

SparkQA commented Dec 15, 2016

Uh oh!

rxin commented Dec 15, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-18856][SQL] non-empty partitioned table should not report zero size #16280

[SPARK-18856][SQL] non-empty partitioned table should not report zero size #16280

Uh oh!

Conversation

cloud-fan commented Dec 14, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Dec 14, 2016

Uh oh!

cloud-fan Dec 14, 2016

Choose a reason for hiding this comment

Uh oh!

ericl Dec 14, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 15, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 14, 2016

Uh oh!

SparkQA commented Dec 15, 2016

Uh oh!

rxin commented Dec 15, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants