Skip to content

Conversation

@jeremylt
Copy link
Member

@jeremylt jeremylt commented Mar 28, 2019

This PR adds the ability to restrict to/from a single block of a blocked restriction. This hopefully provides a mild performance enhancement for the cpu/self/*/blocked backends.

@jeremylt
Copy link
Member Author

jeremylt commented Mar 28, 2019

The CI is currently failing because of Nek, not because of this PR. Investigating. I fixed it

@codecov
Copy link

codecov bot commented Mar 28, 2019

Codecov Report

Merging #228 into master will decrease coverage by 0.02%.
The diff coverage is 92.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #228      +/-   ##
==========================================
- Coverage   93.09%   93.06%   -0.03%     
==========================================
  Files         124      129       +5     
  Lines        7586     7975     +389     
==========================================
+ Hits         7062     7422     +360     
- Misses        524      553      +29
Flag Coverage Δ
#backends 90.24% <93.78%> (+0.38%) ⬆️
#examples 82.18% <ø> (ø) ⬆️
#interface 91.19% <70%> (-0.5%) ⬇️
#tests 96.58% <100%> (+0.05%) ⬆️
Impacted Files Coverage Δ
interface/ceed.c 80.73% <ø> (ø) ⬆️
backends/blocked/ceed-blocked.c 90% <100%> (ø) ⬆️
backends/ref/ceed-ref.c 95.83% <100%> (ø) ⬆️
backends/xsmm/ceed-xsmm-serial.c 88.88% <100%> (ø) ⬆️
backends/xsmm/ceed-xsmm-blocked.c 88.88% <100%> (ø) ⬆️
backends/avx/ceed-avx-blocked.c 90% <100%> (ø) ⬆️
backends/ref/ceed-ref-restriction.c 97.29% <100%> (+0.28%) ⬆️
backends/avx/ceed-avx-serial.c 90% <100%> (ø) ⬆️
tests/t208-elemrestriction.c 100% <100%> (ø)
tests/t208-elemrestriction-f.f90 100% <100%> (ø)
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c532df6...1f37b40. Read the comment docs.

@jeremylt jeremylt force-pushed the rstr-block branch 4 times, most recently from 0b9db4c to 4210a56 Compare March 29, 2019 00:40
CeedChk(ierr);
ierr = CeedQFunctionFieldGetNumComponents(qfinputfields[i], &ncomp);
CeedChk(ierr);
// Restrict active input block
Copy link
Member Author

@jeremylt jeremylt Mar 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I am applying this strategy only to the active input - I am concerned that applying this single block restriction strategy to all inputs will decrease performance by generating extra memory movement for unchanged input vectors

@jeremylt jeremylt force-pushed the rstr-block branch 6 times, most recently from edbab2a to 798aa44 Compare March 29, 2019 02:36
@jeremylt jeremylt force-pushed the rstr-block branch 2 times, most recently from 8fcdd47 to 35a8554 Compare March 29, 2019 06:15
@jeremylt
Copy link
Member Author

jeremylt commented Mar 29, 2019

I included the full evector version of the blocked backend as an option. Right now the performance picture is a bit muddied. For different orders and backends, the full evec version sometimes beats this new version - however this new version seems to be generally better.

@jeremylt
Copy link
Member Author

Jenkins stall

Cannot contact jnlp-pod-bmtbd-rxf37: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException

@jedbrown
Copy link
Member

I restarted the Jenkins run.

@jeremylt
Copy link
Member Author

jeremylt commented Apr 1, 2019

I cherry picked two of these commits to put it PR #231; I'll rebase this branch after that branch is merged.

@jeremylt
Copy link
Member Author

jeremylt commented Apr 5, 2019

Failure due to Jenkins timeout.

@jedbrown
Copy link
Member

jedbrown commented Apr 5, 2019

Restarted and completed. I don't understand the error. Internal routing on GKE should be reliable.

@jeremylt
Copy link
Member Author

jeremylt commented May 3, 2019

I set up the serial AVX and XSMM to use this blocking technique as well. This now has some uninteresting intermediate commits, so I'd squash+merge.

CeedTransposeMode tmode,
CeedTransposeMode lmode,
CeedVector u, CeedVector v,
CeedRequest *request) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the interface be designed to support a zero-copy implementation when element restriction is the identity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions.
A) Would this only be for single element processing? Multi element processing needs a copy to do the shuffle.
B) Would it be better to modify this interface or have the backend handle the special case of single element blocks with an identity restriction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this ordering is most relevant at high order where single-element processing is competitive. One could also imagine an E-vector with block/interlaced elements and an "identity" restriction in that ordering.

I'm concerned about public interfaces that may have poor granularity for some architectures. It lures users into writing fragile code that needs multiple branches for different hardware or for different application choices.

One option for zero-copy that does not change granularity is for the function to return the location where the E-vector data lives, thus having the option to return a pointer to the data where it lies, versus in a buffer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is important for performance, but I'm trying to understand the best place to put the extra complexity.

Are you worried about a user calling CeedElemRestrictionApply in their code or a backend calling CeedElemRestrictionApply?

I don't see a way that an identity restriction in transpose mode avoids branching logic in the backend's CeedOperatorApply code. That decision reaches all the way back to the decision of where to write the output of CeedBasisApply after the QFunction, or possibly where to write the output of CeedQFunctionApply for CEED_EVAL_NONE. In notransopose mode we can, and at one point used to, make the evec point to the lvec data for inputs with an identity restriction (that was dropped in the current blocked backend because of the interlacing), so that side isn't as tricky.

I worry about an interface that returns a data location rather than a vector causing issues for the multi-memory model in different backends.

Currently we copy full evecs, even for identity restrictions. Does better handling of identity restrictions best fit in this PR or a follow-on?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mainly concerned about user code. We maintain all the backends in existence at this point, though we want to eventually have a stable way to support backends maintained externally.

With the current interface, libCEED could determine that the restriction is the identity and make the output vector alias the block of the input vector (instead of copying into separate memory). It could mark the result read-only, but would need some way of releasing the aliased vector. Perhaps a Get/Restore interface. For the transpose mode, we could Get a block of the E-vector for writing and Restore it when done.

  • Identity restriction and matching memory spaces
    • Get sets a writeable view; Restore drops the view
  • Otherwise
    • Get ensures a writeable buffer; Restore applies the transpose restriction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to sound like a follow-on PR in terms of scope. This PR already provides performance enchantment, so maybe we merge this one and start a new PR better handling identity restrictions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's fine with me.

@jeremylt jeremylt force-pushed the rstr-block branch 2 times, most recently from b037bf3 to d945220 Compare May 15, 2019 02:04
@jeremylt jeremylt merged commit d4fd279 into master May 18, 2019
@jeremylt jeremylt deleted the rstr-block branch May 18, 2019 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants