-
Notifications
You must be signed in to change notification settings - Fork 66
Restriction Block #228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restriction Block #228
Conversation
|
|
Codecov Report
@@ Coverage Diff @@
## master #228 +/- ##
==========================================
- Coverage 93.09% 93.06% -0.03%
==========================================
Files 124 129 +5
Lines 7586 7975 +389
==========================================
+ Hits 7062 7422 +360
- Misses 524 553 +29
Continue to review full report at Codecov.
|
0b9db4c to
4210a56
Compare
| CeedChk(ierr); | ||
| ierr = CeedQFunctionFieldGetNumComponents(qfinputfields[i], &ncomp); | ||
| CeedChk(ierr); | ||
| // Restrict active input block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I am applying this strategy only to the active input - I am concerned that applying this single block restriction strategy to all inputs will decrease performance by generating extra memory movement for unchanged input vectors
edbab2a to
798aa44
Compare
8fcdd47 to
35a8554
Compare
|
I included the full evector version of the blocked backend as an option. Right now the performance picture is a bit muddied. For different orders and backends, the full evec version sometimes beats this new version - however this new version seems to be generally better. |
|
Jenkins stall |
|
I restarted the Jenkins run. |
|
I cherry picked two of these commits to put it PR #231; I'll rebase this branch after that branch is merged. |
|
Failure due to Jenkins timeout. |
|
Restarted and completed. I don't understand the error. Internal routing on GKE should be reliable. |
|
I set up the serial AVX and XSMM to use this blocking technique as well. This now has some uninteresting intermediate commits, so I'd squash+merge. |
| CeedTransposeMode tmode, | ||
| CeedTransposeMode lmode, | ||
| CeedVector u, CeedVector v, | ||
| CeedRequest *request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the interface be designed to support a zero-copy implementation when element restriction is the identity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions.
A) Would this only be for single element processing? Multi element processing needs a copy to do the shuffle.
B) Would it be better to modify this interface or have the backend handle the special case of single element blocks with an identity restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this ordering is most relevant at high order where single-element processing is competitive. One could also imagine an E-vector with block/interlaced elements and an "identity" restriction in that ordering.
I'm concerned about public interfaces that may have poor granularity for some architectures. It lures users into writing fragile code that needs multiple branches for different hardware or for different application choices.
One option for zero-copy that does not change granularity is for the function to return the location where the E-vector data lives, thus having the option to return a pointer to the data where it lies, versus in a buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is important for performance, but I'm trying to understand the best place to put the extra complexity.
Are you worried about a user calling CeedElemRestrictionApply in their code or a backend calling CeedElemRestrictionApply?
I don't see a way that an identity restriction in transpose mode avoids branching logic in the backend's CeedOperatorApply code. That decision reaches all the way back to the decision of where to write the output of CeedBasisApply after the QFunction, or possibly where to write the output of CeedQFunctionApply for CEED_EVAL_NONE. In notransopose mode we can, and at one point used to, make the evec point to the lvec data for inputs with an identity restriction (that was dropped in the current blocked backend because of the interlacing), so that side isn't as tricky.
I worry about an interface that returns a data location rather than a vector causing issues for the multi-memory model in different backends.
Currently we copy full evecs, even for identity restrictions. Does better handling of identity restrictions best fit in this PR or a follow-on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mainly concerned about user code. We maintain all the backends in existence at this point, though we want to eventually have a stable way to support backends maintained externally.
With the current interface, libCEED could determine that the restriction is the identity and make the output vector alias the block of the input vector (instead of copying into separate memory). It could mark the result read-only, but would need some way of releasing the aliased vector. Perhaps a Get/Restore interface. For the transpose mode, we could Get a block of the E-vector for writing and Restore it when done.
- Identity restriction and matching memory spaces
- Get sets a writeable view; Restore drops the view
- Otherwise
- Get ensures a writeable buffer; Restore applies the transpose restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to sound like a follow-on PR in terms of scope. This PR already provides performance enchantment, so maybe we merge this one and start a new PR better handling identity restrictions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's fine with me.
b037bf3 to
d945220
Compare
This PR adds the ability to restrict to/from a single block of a blocked restriction. This
hopefullyprovides a mild performance enhancement for thecpu/self/*/blockedbackends.