Skip to content

Conversation

@qjia7
Copy link
Contributor

@qjia7 qjia7 commented Sep 11, 2019

With this change, the add benchmark has close to 100% speedup.
And PoseNet+ResNet demo improves from 3 fps to 6 fps.

We should try to avoid using (1, 1, 1) as the default work group size.

PERF

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

With this change, the add benchmark has close to 100% speedup.
And PoseNet+ResNet demo improves from 3 fps to 6 fps.

We should try to avoid using (1, 1, 1) as the default work group size.

PERF
@qjia7
Copy link
Contributor Author

qjia7 commented Sep 11, 2019

@annxingyuan @kainino0x Please take a look. Thanks.

Copy link
Contributor

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thank you this is so awesome! I had mistakenly believed that there would be no performance benefit from simply bundling threads into thread groups when shared memory is not being used.

@annxingyuan annxingyuan merged commit eef8a32 into tensorflow:master Sep 11, 2019
@qjia7 qjia7 deleted the workGroupSize branch August 13, 2020 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants