Skip to content

Conversation

@navis
Copy link
Contributor

@navis navis commented Jun 8, 2015

Same thing with "hive.map.aggr.hash.min.reduction" in hive, which disables hash aggregation if it's not sufficiently decreasing the output size.

Added two configuration

  • spark.sql.partial.aggregation.checkInterval
  • spark.sql.partial.aggregation.minReduction

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #34736 has finished for PR 6696 at commit 388ea7a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@navis
Copy link
Contributor Author

navis commented Jun 13, 2015

Test fail was just caused by appearance order. Added order-by for deterministic result

@SparkQA
Copy link

SparkQA commented Jun 13, 2015

Test build #34819 has finished for PR 6696 at commit 527c7b5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Scala classes in Scala code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to scala. Could you suggest one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, sorry, I take that back. There is no equivalent in Scala that I can find, that sets an existing array's values. Can I suggest thought that importing just java.util might look confusing, so either just import it (there is no Scala Arrays to mix it up with) or if this is just one usage, write java.util.Arrays.fill(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments. Thanks!

@SparkQA
Copy link

SparkQA commented Jun 13, 2015

Test build #34822 has finished for PR 6696 at commit 4cf1c99.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

It looks like this is failing its own test.

@navis
Copy link
Contributor Author

navis commented Jun 14, 2015

Strange.. cannot reproduce the fail in local env. I'll check it again.

@navis
Copy link
Contributor Author

navis commented Jun 14, 2015

The memory leakage caused the test fail was a existing bug in master branch. Currently, unsafe-based hash is released on 'next' call but if input is empty, it would not be called ever.
Now returns an empty iterator if input is empty.

@JoshRosen
Copy link
Contributor

@navis, good catch on finding the memory leak in the unsafe aggregation path. I think that maybe we should extract that bugfix into its own PR so that it's easier to backport to 1.4.x.

@navis
Copy link
Contributor Author

navis commented Jun 14, 2015

@JoshRosen ok, sure.

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #34864 has finished for PR 6696 at commit f9616b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@navis
Copy link
Contributor Author

navis commented Jun 14, 2015

done in SPARK-8357

@SparkQA
Copy link

SparkQA commented Aug 21, 2015

Test build #41344 timed out for PR 6696 at commit 2c73bbd after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Aug 26, 2015

Test build #41593 has finished for PR 6696 at commit 0b2da51.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LogisticRegressionModel @Since("1.3.0") (
    • class SVMModel @Since("1.1.0") (
    • class FPGrowthModel[Item: ClassTag] @Since("1.3.0") (
    • class FreqItemset[Item] @Since("1.3.0") (
    • class FreqSequence[Item] @Since("1.5.0") (
    • class PrefixSpanModel[Item] @Since("1.5.0") (
    • abstract class SetOperation(left: LogicalPlan, right: LogicalPlan) extends BinaryNode
    • case class Union(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)
    • case class Intersect(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)
    • case class Except(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)

@andrewor14
Copy link
Contributor

A note about the naming: I think you meant spark.sql.partialAggregation.* instead? It doesn't make sense to have a spark.sql.partial.* namespace.

@andrewor14
Copy link
Contributor

@JoshRosen @marmbrus any updates on the details of this patch?

@marmbrus
Copy link
Contributor

marmbrus commented Sep 1, 2015

Unfortunately, the Aggregate1 code path is deprecated and going to be removed shortly. We should probably close this issue and design the feature on JIRA for the new aggregation code path.

@asfgit asfgit closed this in 804a012 Sep 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants