Skip to content

Conversation

@madelynnblue
Copy link
Contributor

Fixes #48214

Release note: None

@madelynnblue madelynnblue requested a review from rytaft May 14, 2020 17:44
@madelynnblue madelynnblue requested a review from a team as a code owner May 14, 2020 17:44
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mjibson and @rytaft)


pkg/sql/opt/xform/coster.go, line 318 at r1 (raw file):

	// The filter has to be evaluated on each input row.
	inputRowCount := sel.Input.Relational().Stats.RowCount
	cost := memo.Cost(inputRowCount) * cpuCostFactor

Should this also call computeFiltersCost? Maybe we should move this cpuCostFactor * rowCount cost inside that function.


pkg/sql/opt/xform/coster.go, line 471 at r1 (raw file):

	cost += memo.Cost(rowsProcessed) * perRowCost

	cost += c.computeFiltersCost(join.On, util.FastIntMap{}, lookupCount)

this should be rowsProcessed not lookupCount


pkg/sql/opt/xform/coster.go, line 520 at r1 (raw file):

				// some float functions, so this is a somewhat
				// data-backed guess.
				const geoFnCost = cpuCostFactor * 10

[nit] move the 10 as a constant at the top of the file, we try to put them all there

Copy link
Contributor Author

@madelynnblue madelynnblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mjibson, @RaduBerinde, and @rytaft)


pkg/sql/opt/xform/coster.go, line 318 at r1 (raw file):

Previously, RaduBerinde wrote…

Should this also call computeFiltersCost? Maybe we should move this cpuCostFactor * rowCount cost inside that function.

just added a second commit for this

Copy link
Contributor Author

@madelynnblue madelynnblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mjibson, @RaduBerinde, and @rytaft)


pkg/sql/opt/xform/coster.go, line 471 at r1 (raw file):

Previously, RaduBerinde wrote…

this should be rowsProcessed not lookupCount

Ok. I assume this should be changed for computeHashJoinCost too, which is using leftRowCount+rightRowCount, instead of rowsProcessed? Help me understand the intuition here. I did it like this because I thought the function would have to be executed on each incoming row from either side? But I guess that doesn't make much sense because the geo functions have two arguments, so probably one from each side. row count vs rows processed apparently is still unclear to me.

Copy link
Contributor Author

@madelynnblue madelynnblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/xform/coster.go, line 520 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] move the 10 as a constant at the top of the file, we try to put them all there

done

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mjibson and @rytaft)


pkg/sql/opt/xform/coster.go, line 471 at r1 (raw file):

Previously, mjibson (Matt Jibson) wrote…

Ok. I assume this should be changed for computeHashJoinCost too, which is using leftRowCount+rightRowCount, instead of rowsProcessed? Help me understand the intuition here. I did it like this because I thought the function would have to be executed on each incoming row from either side? But I guess that doesn't make much sense because the geo functions have two arguments, so probably one from each side. row count vs rows processed apparently is still unclear to me.

Yeah, rowsProcessed estimates exactly what we need - how many rows we evaluate the ON condition on (a row here being a concatenation of a left row and a right row). The output row count is the same with the number of rows that pass the ON condition (for inner join).

In the worst case (a cross join), we may need to evaluate the ON condition on leftRowCount * rightRowCount rows.

An example where this really matters: imagine we have a join on a=x AND b=y and we have a hash join that hashes both columns and a lookup join but on an index only on a. The second one would generate a lot of rows that match on a, on which we'd have to check b=y. In this example, rowsProcessed would be equal to the row count for the hash case, but it would be much higher for the lookup case.


pkg/sql/opt/xform/coster.go, line 323 at r3 (raw file):

	// The filter has to be evaluated on each input row.
	inputRowCount := sel.Input.Relational().Stats.RowCount
	cost := memo.Cost(inputRowCount) * cpuCostFactor

I think we can remove this now, since computeFiltersCost already adds a cpuCostFactor for each one (and also for consistency with the joins where we rely only on computeFiltersCost).

@madelynnblue
Copy link
Contributor Author

Added a new commit that adds limit hints and rows processed.

Copy link
Contributor Author

@madelynnblue madelynnblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/xform/coster.go, line 323 at r3 (raw file):

Previously, RaduBerinde wrote…

I think we can remove this now, since computeFiltersCost already adds a cpuCostFactor for each one (and also for consistency with the joins where we rely only on computeFiltersCost).

computeFiltersCost currently only adds per row costs for expensive geo functions. It only adds a single cpuCostFactor per filter condition, not per row. I think this needs to stay for now.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mjibson and @rytaft)


pkg/sql/opt/xform/coster.go, line 137 at r2 (raw file):

	hugeCost memo.Cost = 1e100

	// Some benchmarks showed that these geo functions were atleast 10

[nit] remove "these" (or replace with "some")
[nit] at least


pkg/sql/opt/xform/coster.go, line 323 at r3 (raw file):

Previously, mjibson (Matt Jibson) wrote…

computeFiltersCost currently only adds per row costs for expensive geo functions. It only adds a single cpuCostFactor per filter condition, not per row. I think this needs to stay for now.

I see, thanks!


pkg/sql/opt/xform/coster.go, line 508 at r5 (raw file):

	if required.LimitHint != 0 && lookupCount > 0 {
		outputRows := join.Relational().Stats.RowCount
		unlimitedLookupCount := lookupCount

Ha, you used the updated logic even before I merged my PR, nice!

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mjibson and @rytaft)


pkg/sql/opt/xform/coster.go, line 541 at r5 (raw file):

func (c *coster) computeFiltersCost(
	filters memo.FiltersExpr, eqMap util.FastIntMap, rowCount float64,

Can we just add the filters cost into the perRowCost instead? I think it's awkward and surprising to have some of the filter operators take into account rowCount and some not. I bet making the existing EqOp cost be per-row instead of a one-time cost wouldn't change any plans, but would allow this API to be cleaner.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r1, 1 of 1 files at r2, 19 of 19 files at r3, 1 of 1 files at r4, 3 of 3 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @mjibson)


pkg/sql/opt/memo/statistics_builder.go, line 2360 at r5 (raw file):

		withoutOn := e.Memo().MemoizeLookupJoin(t.Input, nil /* on */, lookupJoinPrivate)
		return withoutOn.Relational().Stats.RowCount
	case *GeoLookupJoinExpr:

[nit] add a line above this case to be consistent with the formatting in the rest of this switch


pkg/sql/opt/xform/coster.go, line 494 at r5 (raw file):

	//   ab JOIN xy ON a=x AND a=10 AND x=10
	// which can become a lookup join with left-over condition x=10 which doesn't
	// actually filter anything.

Is there a way to make this comment more specific to geo lookup join? Maybe you could simplify the test case with NYC neighborhoods?


pkg/sql/opt/xform/coster.go, line 500 at r5 (raw file):

		// to be optimized twice, the coster should never be used after
		// logPropsBuilder.clear() is called.
		panic(errors.AssertionFailedf("could not get rows processed for lookup join"))

[nit] geo lookup join


pkg/sql/opt/xform/coster.go, line 541 at r5 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Can we just add the filters cost into the perRowCost instead? I think it's awkward and surprising to have some of the filter operators take into account rowCount and some not. I bet making the existing EqOp cost be per-row instead of a one-time cost wouldn't change any plans, but would allow this API to be cleaner.

The problem is when the estimated row count is very tiny, we need a way to prefer certain plans over others, which is why we have the "setup cost".

But I agree it's kind of awkward that this function now takes a row count, but it's only applied for the case of geo functions. I think it would be good to keep this function independent of the row count, but maybe always multiply the result by rowCount+1 instead of just rowCount. Does that change any plans?

Also add per-row  filter costing. This changes some other costs but
didn't result in any changed plans.

Fixes #48214

Release note: None
This will account for expensive geo function executions in WHERE clauses.
Copy link
Contributor Author

@madelynnblue madelynnblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @rytaft)


pkg/sql/opt/memo/statistics_builder.go, line 2360 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] add a line above this case to be consistent with the formatting in the rest of this switch

Done.


pkg/sql/opt/xform/coster.go, line 494 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Is there a way to make this comment more specific to geo lookup join? Maybe you could simplify the test case with NYC neighborhoods?

Done.


pkg/sql/opt/xform/coster.go, line 500 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] geo lookup join

Done.


pkg/sql/opt/xform/coster.go, line 541 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

The problem is when the estimated row count is very tiny, we need a way to prefer certain plans over others, which is why we have the "setup cost".

But I agree it's kind of awkward that this function now takes a row count, but it's only applied for the case of geo functions. I think it would be good to keep this function independent of the row count, but maybe always multiply the result by rowCount+1 instead of just rowCount. Does that change any plans?

I've removed the row count argument and changed all callers of computeFiltersCost to multiply its output by their row count. This does indeed increase many costs (as we would expect since now we are assuming the base cost applies to all rows instead of just a one time setup), but didn't result in any changed plans in the tests.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 23 files at r6, 18 of 20 files at r7, 3 of 3 files at r8.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @mjibson)


pkg/sql/opt/xform/coster.go, line 323 at r8 (raw file):

	// The filter has to be evaluated on each input row.
	inputRowCount := sel.Input.Relational().Stats.RowCount
	cost := memo.Cost(inputRowCount) * cpuCostFactor

You can probably get rid of this cost now, since it's subsumed by computeFiltersCost


pkg/sql/opt/xform/coster.go, line 369 at r8 (raw file):

		rowsProcessed = join.Relational().Stats.RowCount
	}
	cost += memo.Cost(rowsProcessed) * cpuCostFactor

can get rid of this


pkg/sql/opt/xform/coster.go, line 389 at r8 (raw file):

		eqMap.Set(right, left)
	}
	cost += c.computeFiltersCost(*on, eqMap) * memo.Cost(rowsProcessed)

for all of these joins, I think you want to multiply by 1+rowsProcessed to account for the "setup cost" we had before


pkg/sql/opt/xform/coster.go, line 408 at r8 (raw file):

		panic(errors.AssertionFailedf("could not get rows processed for merge join"))
	}
	cost += memo.Cost(rowsProcessed) * cpuCostFactor

can get rid of this


pkg/sql/opt/xform/coster.go, line 481 at r8 (raw file):

		c.rowScanCost(join.Table, join.Index, numLookupCols)

	cost += memo.Cost(rowsProcessed) * perRowCost

can get rid of this, if you add the result of computeFiltersCost to the perRowCost


pkg/sql/opt/xform/coster.go, line 496 at r8 (raw file):

	// left-over conditions that are not selective.
	// A contrived example, where gid is a SERIAL PK:
	//   nyc_census_blocks c JOIN nyc_neighborhoods n ON ST_Intersects(c.geom, n.geom) AND c.gid = 10

[nit] wrap to 80 chars


pkg/sql/opt/xform/coster.go, line 498 at r8 (raw file):

	//   nyc_census_blocks c JOIN nyc_neighborhoods n ON ST_Intersects(c.geom, n.geom) AND c.gid = 10
	// which can become a lookup join with left-over condition c.gid = 10 which doesn't
	// actually filter anything.

The statement "which doesn't actually filter anything" isn't correct for this example (c.gid = 10 does in fact filter the output). But the example is still valid since we would need to evaluate the filter condition on every row processed by the GeoLookupJoin, and we need to account for this in the cost.

On a somewhat unrelated note, c.gid = 10 is such a selective filter that we'd probably never choose a GeoLookupJoin over a constrained primary index scan in this case. I'd suggest changing the filter in the example to something like c.gid < n.gid (still totally contrived, but at least more likely to result in a GeoLookupJoin with an ON condition).


pkg/sql/opt/xform/testdata/coster/join, line 539 at r8 (raw file):

 └── inner-join (lookup abcde@idx_abd)
      ├── columns: w:1!null x:2!null y:3!null z:4!null i:5!null j:6!null k:7!null l:8!null m:9!null n:10!null a:12!null b:13!null c:14!null
      ├── key columns: [1 2] = [12 13]

we lost one of the equality columns here -- I think this is the reason we needed the setup cost

@madelynnblue
Copy link
Contributor Author

Closing due to too many unrelated changes happening at once. Getting all the geo stuff in and will do the other joins and select in another PR.

@madelynnblue madelynnblue deleted the geolookup-cost branch May 19, 2020 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

opt: update coster to account for cost of executing geospatial functions

5 participants