Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@knopp
Copy link
Member

@knopp knopp commented Aug 3, 2023

Results for the static_path_tessellation macrobenchmark:

average_frame_build_time_millis Time
Cache Disabled 0.4710688405797106
Cache Enabled 0.46162815884476605
average_frame_rasterizer_time_millis Time
Cache Disabled 6.654328519855595
Cache Enabled 1.6611624548736454

I had to run the perf test with background thread spinning (see flutter/flutter#131837) otherwise the variant with cache enabled actually regressed in average_frame_build_time_millis because iOS let the CPU run at lower speed.

The cache implementation is very straightforward, leverages the fact that SkPaths are retained by the framework and thus we can rely on SkPath::getGenerationID() to identify paths that are retained across frames. The cache size is bounded by unused paths being evicted immediately at the end of each frame.

Both polylines and tessellation results are cached.

List which issues are fixed by this PR. You must list at least one issue.

If you had to change anything in the flutter/tests repo, include a link to the migration guide as per the breaking change policy.

Pre-launch Checklist

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I read and followed the Flutter Style Guide and the C++, Objective-C, Java style guides.
  • I listed at least one issue that this PR fixes in the description above.
  • I added new tests to check the change I am making or feature I am adding, or Hixie said the PR is test-exempt. See testing the engine for instructions on writing and running engine tests.
  • I updated/added relevant documentation (doc comments with ///).
  • I signed the CLA.
  • All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

@flutter-dashboard
Copy link

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!).

If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix?

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

@knopp knopp force-pushed the add_tessellation_cache branch 3 times, most recently from 965a1cc to 6894715 Compare August 3, 2023 16:25
@knopp knopp force-pushed the add_tessellation_cache branch from 6894715 to ad80d8f Compare August 3, 2023 16:31
Copy link
Contributor

@jonahwilliams jonahwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also add a benchmark that intentionally does not hit the cache, perhaps by doing scale transformations. I'm technically OOO today, but I'd also like to see a CPU breakdown of where these costs are going.

Tessellation itself is unlikely to be improved, but certain kinds of paths have better algorithms we can use.

Comment on lines +46 to +50
auto res = picture.pass->Render(*content_context_, render_target);
// FIXME(knopp): This should be called for the last surface of the frame,
// but there's currently no way to do this.
content_context_->GetTessellationCache().FinishFrame();
return res;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no concept of a frame, but you could probably use the concept of a particular presentation. See context_vk, this tracks increments in the swapchain. We might need to bring a similar concept out of the platform specific backend.


Path::Polyline TessellationCache::GetOrCreatePolyline(
const impeller::Path& path,
Scalar scale) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take the exact scale as a cache key then animated transformations will thrash your cache each frame despite the vertices being almost unchanged.

: Convexity::kUnknown);
return builder.TakePath(fill_type);
auto result = builder.TakePath(fill_type);
result.SetOriginalGenerationId(path.getGenerationID());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this generation ID come from?

Copy link
Member Author

@knopp knopp Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's assigned to path on request from a global counter. Any time the path is mutated it's generationId is reset. That way if path has same generationId it is guaranteed to be same instance unmodified.

Comment on lines 860 to +861
std::shared_ptr<Tessellator> tessellator_;
std::unique_ptr<TessellationCache> tessellation_cache_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the tessellation cache own the tessellator?

Comment on lines +55 to +67
return tesselator.Tessellate(
fill_type, polyline,
[&](const float* vertices, size_t vertices_size, const uint16_t* indices,
size_t indices_size) {
TessellatorData data;
data.vertices.assign(vertices, vertices + vertices_size);
data.indices.assign(indices, indices + indices_size);
tessellator_cache_.Set(key, data);
return callback(vertices, vertices_size, indices, indices_size);
});
#else
return tesselator.Tessellate(fill_type, polyline, callback);
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if the tessellation cache was a bit more general and could also cache data that was generated by compute. For an example, look at the draw points geometry.

For that to work, it would likely have to be able to vend device buffers instead.

On the other hand, we'll likely batch compute workloads in ways that make these buffers difficult to use. So perhaps only using the tessellation cache for the CPU bound workloads is the way forward.

@chinmaygarde chinmaygarde added the Work in progress (WIP) Not ready (yet) for review! label Aug 3, 2023
@knopp
Copy link
Member Author

knopp commented Aug 4, 2023

I added a test commit (not meant to be part of PR) to remove allocations from path components extrema which seems which seems to speed up bound calculation further reducing rasterizer time from 1.66ms down to 1.37ms.

But still there seem to be about 30% of rasterization time spent translating same skia paths over and again, and calculating bounds for every path every frame. I think the cache as it is implemented here might be at the wrong place.

@knopp
Copy link
Member Author

knopp commented Aug 4, 2023

Closing in favor of #44390 (better performance)

@knopp knopp closed this Aug 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

e: impeller Work in progress (WIP) Not ready (yet) for review!

Projects

No open projects
Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants