Restriction Block #228

jeremylt · 2019-03-28T23:32:40Z

This PR adds the ability to restrict to/from a single block of a blocked restriction. This ~~hopefully~~ provides a mild performance enhancement for the cpu/self/*/blocked backends.

jeremylt · 2019-03-28T23:43:49Z

~~The CI is currently failing because of Nek, not because of this PR. Investigating.~~ I fixed it

codecov · 2019-03-28T23:55:32Z

Codecov Report

Merging #228 into master will decrease coverage by 0.02%.
The diff coverage is 92.92%.

@@            Coverage Diff             @@
##           master     #228      +/-   ##
==========================================
- Coverage   93.09%   93.06%   -0.03%     
==========================================
  Files         124      129       +5     
  Lines        7586     7975     +389     
==========================================
+ Hits         7062     7422     +360     
- Misses        524      553      +29

Flag	Coverage Δ
#backends	`90.24% <93.78%> (+0.38%)`	⬆️
#examples	`82.18% <ø> (ø)`	⬆️
#interface	`91.19% <70%> (-0.5%)`	⬇️
#tests	`96.58% <100%> (+0.05%)`	⬆️

Impacted Files	Coverage Δ
interface/ceed.c	`80.73% <ø> (ø)`	⬆️
backends/blocked/ceed-blocked.c	`90% <100%> (ø)`	⬆️
backends/ref/ceed-ref.c	`95.83% <100%> (ø)`	⬆️
backends/xsmm/ceed-xsmm-serial.c	`88.88% <100%> (ø)`	⬆️
backends/xsmm/ceed-xsmm-blocked.c	`88.88% <100%> (ø)`	⬆️
backends/avx/ceed-avx-blocked.c	`90% <100%> (ø)`	⬆️
backends/ref/ceed-ref-restriction.c	`97.29% <100%> (+0.28%)`	⬆️
backends/avx/ceed-avx-serial.c	`90% <100%> (ø)`	⬆️
tests/t208-elemrestriction.c	`100% <100%> (ø)`
tests/t208-elemrestriction-f.f90	`100% <100%> (ø)`
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c532df6...1f37b40. Read the comment docs.

jeremylt · 2019-03-29T00:43:59Z

backends/blocked/ceed-blocked-operator.c

      CeedChk(ierr);
      ierr = CeedQFunctionFieldGetNumComponents(qfinputfields[i], &ncomp);
      CeedChk(ierr);
+      // Restrict active input block


For now I am applying this strategy only to the active input - I am concerned that applying this single block restriction strategy to all inputs will decrease performance by generating extra memory movement for unchanged input vectors

backends/blocked/ceed-blocked-operator.c

.travis.yml

jeremylt · 2019-03-29T17:10:21Z

I included the full evector version of the blocked backend as an option. Right now the performance picture is a bit muddied. For different orders and backends, the full evec version sometimes beats this new version - however this new version seems to be generally better.

jeremylt · 2019-03-29T17:53:38Z

Jenkins stall

Cannot contact jnlp-pod-bmtbd-rxf37: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException

jedbrown · 2019-03-31T05:24:51Z

I restarted the Jenkins run.

jeremylt · 2019-04-01T17:46:47Z

I cherry picked two of these commits to put it PR #231; I'll rebase this branch after that branch is merged.

jeremylt · 2019-04-05T18:55:04Z

Failure due to Jenkins timeout.

jedbrown · 2019-04-05T19:14:49Z

Restarted and completed. I don't understand the error. Internal routing on GKE should be reliable.

jeremylt · 2019-05-03T17:58:29Z

I set up the serial AVX and XSMM to use this blocking technique as well. This now has some uninteresting intermediate commits, so I'd squash+merge.

interface/ceed-elemrestriction.c

jedbrown · 2019-05-14T05:48:37Z

interface/ceed-elemrestriction.c

+                                  CeedTransposeMode tmode,
+                                  CeedTransposeMode lmode,
+                                  CeedVector u, CeedVector v,
+                                  CeedRequest *request) {


Should the interface be designed to support a zero-copy implementation when element restriction is the identity?

Two questions.
A) Would this only be for single element processing? Multi element processing needs a copy to do the shuffle.
B) Would it be better to modify this interface or have the backend handle the special case of single element blocks with an identity restriction.

I think this ordering is most relevant at high order where single-element processing is competitive. One could also imagine an E-vector with block/interlaced elements and an "identity" restriction in that ordering.

I'm concerned about public interfaces that may have poor granularity for some architectures. It lures users into writing fragile code that needs multiple branches for different hardware or for different application choices.

One option for zero-copy that does not change granularity is for the function to return the location where the E-vector data lives, thus having the option to return a pointer to the data where it lies, versus in a buffer.

I agree that this is important for performance, but I'm trying to understand the best place to put the extra complexity.

Are you worried about a user calling CeedElemRestrictionApply in their code or a backend calling CeedElemRestrictionApply?

I don't see a way that an identity restriction in transpose mode avoids branching logic in the backend's CeedOperatorApply code. That decision reaches all the way back to the decision of where to write the output of CeedBasisApply after the QFunction, or possibly where to write the output of CeedQFunctionApply for CEED_EVAL_NONE. In notransopose mode we can, and at one point used to, make the evec point to the lvec data for inputs with an identity restriction (that was dropped in the current blocked backend because of the interlacing), so that side isn't as tricky.

I worry about an interface that returns a data location rather than a vector causing issues for the multi-memory model in different backends.

Currently we copy full evecs, even for identity restrictions. Does better handling of identity restrictions best fit in this PR or a follow-on?

I'm mainly concerned about user code. We maintain all the backends in existence at this point, though we want to eventually have a stable way to support backends maintained externally.

With the current interface, libCEED could determine that the restriction is the identity and make the output vector alias the block of the input vector (instead of copying into separate memory). It could mark the result read-only, but would need some way of releasing the aliased vector. Perhaps a Get/Restore interface. For the transpose mode, we could Get a block of the E-vector for writing and Restore it when done.

Identity restriction and matching memory spaces

Get sets a writeable view; Restore drops the view

Otherwise

Get ensures a writeable buffer; Restore applies the transpose restriction.

This is starting to sound like a follow-on PR in terms of scope. This PR already provides performance enchantment, so maybe we merge this one and start a new PR better handling identity restrictions?

Yes, that's fine with me.

jeremylt added performance CPU 1-In Review labels Mar 28, 2019

jeremylt self-assigned this Mar 28, 2019

jeremylt requested review from jedbrown and valeriabarra March 28, 2019 23:32

jeremylt force-pushed the rstr-block branch 4 times, most recently from 0b9db4c to 4210a56 Compare March 29, 2019 00:40

jeremylt commented Mar 29, 2019

View reviewed changes

backends/blocked/ceed-blocked-operator.c Outdated Show resolved Hide resolved

jeremylt force-pushed the rstr-block branch 6 times, most recently from edbab2a to 798aa44 Compare March 29, 2019 02:36

jeremylt commented Mar 29, 2019

View reviewed changes

.travis.yml Outdated Show resolved Hide resolved

jeremylt force-pushed the rstr-block branch 2 times, most recently from 8fcdd47 to 35a8554 Compare March 29, 2019 06:15

jeremylt force-pushed the rstr-block branch from 961db9a to 1e80d60 Compare April 2, 2019 01:49

jeremylt force-pushed the rstr-block branch from 1e80d60 to e43dfc9 Compare April 17, 2019 21:12

jeremylt mentioned this pull request May 9, 2019

OCCA Backend Overhaul #245

Closed

jeremylt force-pushed the rstr-block branch from bb77c5e to dd08760 Compare May 14, 2019 05:05

jedbrown reviewed May 14, 2019

View reviewed changes

jeremylt force-pushed the rstr-block branch 2 times, most recently from b037bf3 to d945220 Compare May 15, 2019 02:04

jeremylt added 6 commits May 18, 2019 09:39

Add ElemRestrictionApplyBlock

be9261b

Add restriction by block to /cpu/self/*/blocked

a765294

Include full evec blocked backend

045b9c4

Use blocking in optimized serial backends

89c6efa

Update t208-f to use offset

9fbf56a

Add block paramenter example in doc

1f37b40

jeremylt force-pushed the rstr-block branch from d945220 to 1f37b40 Compare May 18, 2019 15:42

jedbrown approved these changes May 18, 2019

View reviewed changes

jeremylt mentioned this pull request May 18, 2019

Identity Restriction Handling #250

Closed

jeremylt merged commit d4fd279 into master May 18, 2019

jeremylt deleted the rstr-block branch May 18, 2019 16:44

Restriction Block #228

Restriction Block #228

Uh oh!

Conversation

jeremylt commented Mar 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremylt commented Mar 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jeremylt Mar 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jeremylt commented Mar 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremylt commented Mar 29, 2019

Uh oh!

jedbrown commented Mar 31, 2019

Uh oh!

jeremylt commented Apr 1, 2019

Uh oh!

jeremylt commented Apr 5, 2019

Uh oh!

jedbrown commented Apr 5, 2019

Uh oh!

jeremylt commented May 3, 2019

Uh oh!

Uh oh!

jedbrown May 14, 2019

Choose a reason for hiding this comment

Uh oh!

jeremylt May 14, 2019

Choose a reason for hiding this comment

Uh oh!

jedbrown May 18, 2019

Choose a reason for hiding this comment

Uh oh!

jeremylt May 18, 2019

Choose a reason for hiding this comment

Uh oh!

jedbrown May 18, 2019

Choose a reason for hiding this comment

Uh oh!

jeremylt May 18, 2019

Choose a reason for hiding this comment

Uh oh!

jedbrown May 18, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeremylt commented Mar 28, 2019 •

edited

Loading

jeremylt commented Mar 28, 2019 •

edited

Loading

codecov bot commented Mar 28, 2019 •

edited

Loading

jeremylt Mar 29, 2019 •

edited

Loading

jeremylt commented Mar 29, 2019 •

edited

Loading