[SNAP-1194] Optimization for single dictionary column group by and join #437

sumwale · 2016-12-01T20:46:03Z

Changes proposed in this pull request

Single column dictionary optimization to make use of dictionary indexes for a column batch instead of strings. This one is the simpler variant where an array of the dictionary size is created. This array is populated with the corresponding MapEntry object of the main HashMap on demand (i.e. lookup and fill in if found else put an EMPTY marker if not found) so other columns can be updated/fetched directly from array skipping the map completely after first miss (update for GROUP BY and fetch for JOIN). Some more details can be found in class comments of DictionaryOptimizedMapAccessor.

new DictionaryOptimizedMapAccessor to check for single column dictionary key case and generated code for the same (array creation, fetch from map on miss and return the other columns)
refactored the map lookup code both in the code generator ObjectMapAccessor as well as in actual generated code to enable invocation for both dictionary case or other cases; this introduces no overhead rather it is slightly more efficient in some cases where JVM can dynamically decide whether or not to inline the method call as per CPU instruction cache size
Changed the pattern in join consume. Earlier it used to invoke a "moveNext" at the start assuming iterator is placed before first row. Now generated code does not make this assumption rather than iterator is placed at first row (with above changes it difficult to fit in the "before first row" pattern). To circumvent the problem of consume code calling "continue" and expecting to move to next row (now it will go into infinite loop), it is surrounded with an otherwise useless "do { consume } while(false);" so that a continue will break out and then go on to moveNext -- looks like "while (true) { do { consume } while(false); moveNext }"
Use a generic map in SnappySession to keep track of any addition "context" objects during code generation. Used to pass around dictionary variable names and a new "finallyCode" block which is used to combine multiple "try{} finally {}" in generated code into a single block.
Added a HashedObjectCache for LocalJoin map that is shared by multiple partitions on the same node. This helps both in reduction of effort to create the map as well as lesser memory overhead hence better CPU cache behaviour. It is created on first get and removed when the last reference is removed (so could be created multiple times in single query for each set of scheduled partitions on a node). This behaviour helps avoid the invalidation complexity while adding minimal overhead.
Handle StartsWith predicate for MAX/MIN by treating it like a range. 'ABC%' is treated as ">= 'ABC' and < 'ABD'"
Skip creation of SnappyHashAggregateExec completely if code generation is not possible (due to an ImperativeAggregate). This allows the doExecute of SnappyHashAggregateExec to simply fallback to code-generation assuming it will never fail.
Added Utils.metricsMethods and call it from all Snappy optimized plans to allow invoking optimized primitive methods avoiding boxing/unboxing overhead for SQLMetrics (see snappy-spark PR linked below)
Remove the opt=F case that skipped optimized implementation for LocalJoin and HashAggregate. It is no longer useful for comparison and does not work for LocalJoin with the changes in this PR. Removed from both TPCETrade as well as disabled in LocalJoin.

Patch testing

precheckin

ReleaseNotes.txt changes

NA

Other PRs

TIBCOSoftware/snappy-spark#33

The cache is created when the first partition asks for it and maintained only till the last partition references it. This means that the map can potentially get re-created multiple times if all partitions did not get scheduled to an executor in one shot. This is acceptable given that this avoids the complications of invalidating the cache.

…/hashJoin

- groupBy/join operations much faster for on single column dictionary strings -- base groupBy/join or combination is 2-3X faster - many other optimizations in groupBy/join generated code; overall single column integer groupBy/join is also 1.5X faster

…allback to normal Spark HashAggregateExec

…essions in group by expressions else aggregate expressions can consume and empty the ExprCode.code

- normal createMap() function as before; Callable class is separate since code may have "shouldStop()" which cannot be invoked even from sub-class so instead the Callable class now calls createMap()

No longer useful in comparison and does not work anymore for LocalJoin, so disabled it and removed from TPCETrade test (will be completely removed once Hemant's changes are merged)

generated code to skip shouldStop() under certain conditions (aggregations) was causing trouble in some queries so removed for now increased the default buckets in local mode now that partition overhead is smaller (and will become more so with SNAP-1190)

Check for column batch skipping for LIKE 'XYZ%' kind of queries. Update UnifiedPartitionerTest as per the new default buckets. Corrected compiler warnings in CatalogConsistencyDUnitTest and adding an assertion after drop in proper place where drop table is expected to fail.

Cleaned up the new code added in 1d144f9 to ColumnTableScan and ExistingPlans.scala

Sumedh Wale added 10 commits December 1, 2016 19:39

first cut of dictionary optimization for single column string groupBy…

998c532

…/hashJoin

when code-generation is not possible then change the plan itself to f…

0fe9cae

…allback to normal Spark HashAggregateExec

pre-evaluate input variables code in SnappyHashAggregateExec for expr…

071044b

…essions in group by expressions else aggregate expressions can consume and empty the ExprCode.code

split out Callable class in generated code

41b7ddf

- normal createMap() function as before; Callable class is separate since code may have "shouldStop()" which cannot be invoked even from sub-class so instead the Callable class now calls createMap()

Remove optimized=F case

2fabd34

No longer useful in comparison and does not work anymore for LocalJoin, so disabled it and removed from TPCETrade test (will be completely removed once Hemant's changes are merged)

fixing test failures with upstream spark

816d053

fixing fallback call to AggUtils for one distinct case

5a5f490

sumwale assigned kneeraj, hbhanawat and soubhik-c Dec 1, 2016

Sumedh Wale added 5 commits December 2, 2016 02:58

adding kafka-sql project at top-level

0a65d06

Merge remote-tracking branch 'origin/master' into SNAP-1194

15aae9a

Merge remote-tracking branch 'origin/master' into SNAP-1194

c15bfd5

Cleaned up the new code added in 1d144f9 to ColumnTableScan and ExistingPlans.scala

corrected one check in previous merge conflict resolution

f8bbe00

sumwale merged commit 78bbf14 into master Dec 3, 2016

sumwale deleted the SNAP-1194 branch December 5, 2016 22:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SNAP-1194] Optimization for single dictionary column group by and join #437

[SNAP-1194] Optimization for single dictionary column group by and join #437

Uh oh!

sumwale commented Dec 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SNAP-1194] Optimization for single dictionary column group by and join #437

[SNAP-1194] Optimization for single dictionary column group by and join #437

Uh oh!

Conversation

sumwale commented Dec 1, 2016

Changes proposed in this pull request

Patch testing

ReleaseNotes.txt changes

Other PRs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants