-
Notifications
You must be signed in to change notification settings - Fork 6k
WIP: [impeller] Add tessellation cache #44324
Conversation
|
It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!). If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix? Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. |
965a1cc to
6894715
Compare
6894715 to
ad80d8f
Compare
jonahwilliams
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also add a benchmark that intentionally does not hit the cache, perhaps by doing scale transformations. I'm technically OOO today, but I'd also like to see a CPU breakdown of where these costs are going.
Tessellation itself is unlikely to be improved, but certain kinds of paths have better algorithms we can use.
| auto res = picture.pass->Render(*content_context_, render_target); | ||
| // FIXME(knopp): This should be called for the last surface of the frame, | ||
| // but there's currently no way to do this. | ||
| content_context_->GetTessellationCache().FinishFrame(); | ||
| return res; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no concept of a frame, but you could probably use the concept of a particular presentation. See context_vk, this tracks increments in the swapchain. We might need to bring a similar concept out of the platform specific backend.
|
|
||
| Path::Polyline TessellationCache::GetOrCreatePolyline( | ||
| const impeller::Path& path, | ||
| Scalar scale) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take the exact scale as a cache key then animated transformations will thrash your cache each frame despite the vertices being almost unchanged.
| : Convexity::kUnknown); | ||
| return builder.TakePath(fill_type); | ||
| auto result = builder.TakePath(fill_type); | ||
| result.SetOriginalGenerationId(path.getGenerationID()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this generation ID come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's assigned to path on request from a global counter. Any time the path is mutated it's generationId is reset. That way if path has same generationId it is guaranteed to be same instance unmodified.
| std::shared_ptr<Tessellator> tessellator_; | ||
| std::unique_ptr<TessellationCache> tessellation_cache_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the tessellation cache own the tessellator?
| return tesselator.Tessellate( | ||
| fill_type, polyline, | ||
| [&](const float* vertices, size_t vertices_size, const uint16_t* indices, | ||
| size_t indices_size) { | ||
| TessellatorData data; | ||
| data.vertices.assign(vertices, vertices + vertices_size); | ||
| data.indices.assign(indices, indices + indices_size); | ||
| tessellator_cache_.Set(key, data); | ||
| return callback(vertices, vertices_size, indices, indices_size); | ||
| }); | ||
| #else | ||
| return tesselator.Tessellate(fill_type, polyline, callback); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if the tessellation cache was a bit more general and could also cache data that was generated by compute. For an example, look at the draw points geometry.
For that to work, it would likely have to be able to vend device buffers instead.
On the other hand, we'll likely batch compute workloads in ways that make these buffers difficult to use. So perhaps only using the tessellation cache for the CPU bound workloads is the way forward.
|
I added a test commit (not meant to be part of PR) to remove allocations from path components extrema which seems which seems to speed up bound calculation further reducing rasterizer time from But still there seem to be about 30% of rasterization time spent translating same skia paths over and again, and calculating bounds for every path every frame. I think the cache as it is implemented here might be at the wrong place. |
|
Closing in favor of #44390 (better performance) |
Results for the static_path_tessellation macrobenchmark:
I had to run the perf test with background thread spinning (see flutter/flutter#131837) otherwise the variant with cache enabled actually regressed in
average_frame_build_time_millisbecause iOS let the CPU run at lower speed.The cache implementation is very straightforward, leverages the fact that
SkPaths are retained by the framework and thus we can rely onSkPath::getGenerationID()to identify paths that are retained across frames. The cache size is bounded by unused paths being evicted immediately at the end of each frame.Both polylines and tessellation results are cached.
List which issues are fixed by this PR. You must list at least one issue.
If you had to change anything in the flutter/tests repo, include a link to the migration guide as per the breaking change policy.
Pre-launch Checklist
///).If you need help, consider asking for advice on the #hackers-new channel on Discord.