buffer: floating point improvements #4504

trevnorris · 2013-01-01T10:17:15Z

Improvements:

floating point operations are approx 3x's faster
Now write quiet NaN's
all read/write on floating point now done in C, so no more need for
lib/buffer_ieee754.js
float values have more accurate min/max value checks

I would have liked to do all assert checks in C, but unfortunately haven't been able to figure out how to properly check overflow when writing to a fast buffer.

trevnorris · 2013-01-01T10:39:18Z

link in comment lost after amending commit: trevnorris/node@c6bda30#commitcomment-2370744

trevnorris · 2013-01-04T18:41:46Z

@deanm If you already have the fixes waiting in a pull request, go ahead and point me to them now. But I have one concern. While this may be fixed on the C side with my changes, there is nothing on the js side for all the other read/write methods. Which means inconsistent behavior. imho all those changes would be better for another pull request after this one has been accepted.

@bnoordhuis Mind taking a peek at my changes? I think the "fat" you mentioned on the mailing list has been removed. This is my first C side pull request, so any tips are helpful.

bnoordhuis · 2013-01-04T22:35:14Z

src/node_buffer.cc

Superfluous As<Object> cast; This() returns a Local<Object> (which you may want to cache in a local variable - though hopefully the compiler is smart enough to figure out it's invariant.)

Also, use static_cast when casting from void* to char*.

Got reinterpret_cast<char*> from Buffer::Data. Is there a reason it's being used there?

No, using reinterpret_cast there is wrong (though a minor sin). Updated in 5664dd2.

trevnorris · 2013-01-05T08:44:57Z

Ok. I think the changes made encapsulate all the points brought up. And thanks. It feels a lot cleaner now.

piscisaureus · 2013-01-05T20:19:15Z

src/node_buffer.cc

I think I would make non_native_endian a template argument as well; this forces the compiler to generate separate functions for both cases.

I have re-written the TypedArray code to have an additional interface for swapping the types, and for a compiler define for little endian/big endian.

#4522

I have written it intentionally so the buffer code can just include v8_typed_array_bswap.h and use the same endian/swapping.

@piscisaureus Just to make sure I understand what you're talking about, are these snippets what you mean:

template <typename T, const bool non_native_endian> Handle<Value> ReadGeneric(const Arguments& args) {

Handle<Value> Buffer::ReadFloatLE(const Arguments& args) { return ReadGeneric<float, BE_NATIVE_ARCH>(args); }

trevnorris · 2013-01-11T05:43:11Z

Hold on to this PR. Just found a serious performance regression if no value is passed for noAssert directly to a SlowBuffer. Getting that figured out now.

trevnorris · 2013-01-11T20:16:07Z

src/node_buffer.cc

afaik this check is safe, and around 15% faster than ->IsUint32().

trevnorris · 2013-01-11T21:38:31Z

Ok. Performance regressions sorted out. Everything's cleaned up. Added more perf tests. Should be good to go.

bnoordhuis · 2013-01-11T23:49:39Z

src/node_buffer.cc

Use BooleanValue() here.

trevnorris · 2013-01-15T05:57:56Z

@bnoordhuis Added an additional file tonight (benchmark/_bench_timer.js) to help with all the benchmarks I've been creating for buffers. It was meant to provide a simple mechanism for time testing sync and async tests. It will run all sync tests in order unless an async is loaded. At which time all sync tests are paused until the former are completed.

Also believe everything has been addressed since the last round of comments. Wasn't sure how to implement a specialized function template like you recommended, so hopefully my solution will do.

deanm · 2013-01-15T15:40:00Z

Just to make your life more complicated, v8_typed_array_bswap.h is now committed (along with endian-ness macros and swap routines). I don't really see a reason to duplicate that here since it already exists now in the Node code base. Also there was an integer overflow fix (your current code has the integer overflow bug currently).

I would suggest just taking the new setGeneric and getGeneric functions from v8_typed_array.cc.

But, there is still two pull requests outstanding, so you might want to pull in the changes from those also. Again it's up to you but doesn't seem like a great idea to have copied/pasted the old code when it has now been improved and bug fixed.

trevnorris · 2013-01-15T17:32:58Z

@deanm There are several reasons why I've opted to not use the array buffer files:

There is no integer overflow bug if the user leaves assert checks on. If they decide to disable the assert checks then they've acknowledged all their read/write parameters will be correct. To demonstrate this fact I've included the same set of unit tests that were included when the bug in array buffers was discovered, on both fast and slow buffers.
setGeneric doesn't contain all the checks that previously existed in the old read/write methods.
setGeneric doesn't have the option to disable assert checks, which would change the existing buffer api.
Because of the additional checks included in my code, if the user leaves on assert then it will perform on par with your implementation. But if they disable assert checks then these perform up to 2x's faster than your implementation. Also there is a massive performance difference between your set LE and BE methods (which is why I said the current implementation is "on par"). I've committed a suit of benchmarks to demonstrate this. Below are my run times:

DataView
setFloat32 - LE: 422,124 µs
setFloat32 - BE: 700,792 µs
setFloat64 - LE: 412,276 µs
setFloat64 - BE: 710,480 µs

Buffer
writeFloatLE:  670,605 µs
writeFloatBE:  686,408 µs
writeDoubleLE: 665,075 µs
writeDoubleBE: 690,582 µs

Buffer - noAssert
writeFloatLE:  310,308 µs
writeFloatBE:  337,131 µs
writeDoubleLE: 318,209 µs
writeDoubleBE: 359,304 µs

deanm · 2013-01-15T17:47:44Z

Interesting, I would not expect a difference between LE/BE, in fact I would expect bswaping on the value instead of swizzling on the memory should be an improvement. I would guess it's a difference in the compiler's inlining decision, I will have a look now.

deanm · 2013-01-15T18:29:11Z

I just investigated this. The performance difference is not do to the bswap code, but the fact that your benchmark doesn't pass a parameter to big/little endian. Explicitly passing false make the performance actually slightly better than LE.

Therefore I would expect that using the bswap routines you would see better performance for your BE writes.

In the meantime I will look at my BooleanValue() is slow on the undefined parameter and what I can due to speed it up. I imagine it's partly because I am indexing outside of Arguments[] and then calling BooleanValue on the Undefined value it has to return.

Can you try your test again with ca0d2a0b31f23f02590c54b7145db9c696d0be06 applied?

PS: http://code.google.com/p/v8/issues/detail?id=2491

trevnorris · 2013-01-15T18:51:22Z

@deanm The noAssert BooleanValue problem is actually a bug in v8. I've reported the bug and is a known issue. This is why you'll see in my changes to lib/buffer.js I pass !!noAssert (thanks @bnoordhuis). For the fastest work around do the following: bool little_endian = args[2]->IsTrue() ? true : false;

I'll give bswap a run through the benchmarks and report back.

deanm · 2013-01-15T18:58:06Z

@trevnorris I've made the change in ca0d2a0 (I don't see how !!noAssert applies here?)

trevnorris · 2013-01-15T19:31:02Z

@deanm It applies since !!noAssert forces undefined to false before the argument is passed to the method. This is how I got around the performance hit using BooleanValue()

deanm · 2013-01-15T19:40:04Z

@trevnorris yes but there is no JS layer for typed arrays

trevnorris · 2013-01-15T19:45:49Z

@deanm I know that my solution didn't apply to your situation. Was just demonstrating that I encountered the same problem and how I got around it.

trevnorris · 2013-01-15T20:19:58Z

@deanm I benchmarked the use of v8_typed_array::SwapBytesAndStore<T>(ptr, val); instead of swizzle. The results are comparable in the normal use case:

               bswap          current
writeFloatLE:  565,043 µs     576,468 µs
writeFloatBE:  552,146 µs     567,918 µs
writeDoubleLE: 526,411 µs     544,727 µs
writeDoubleBE: 544,566 µs     567,811 µs

But again that's because the current implementation has additional checks (e.g. proper bounds checking on floats). So if those checks are disabled:

               bswap          current
writeFloatLE:  500,153 µs     379,895 µs
writeFloatBE:  510,261 µs     425,825 µs
writeDoubleLE: 432,707 µs     409,362 µs
writeDoubleBE: 427,186 µs     442,892 µs

The performance difference for bswap is that right now some checks must be performed in javascript. I can't test against the current data view implementation since doesn't support the ability to disable assert checks. Which would be a change to the Buffer api.

deanm · 2013-01-15T20:44:19Z

Can't you just use the bswap portion of the code without the checks? Meaning just changing the swizzle portion?

bnoordhuis · 2013-01-15T20:56:17Z

@deanm @trevnorris Can you guys work out between the two of you what patches should go in what order? Bonus points if you wrap it up in a single PR. :-)

Dean, you may want to come hang out in #libuv on freenode.org - might make communication a little easier / quicker.

trevnorris · 2013-01-15T20:58:14Z

That's what the benchmarks above displays (having just swapped out the swizzle portion).

deanm · 2013-01-15T21:24:04Z

@trevnorris I would be really surprised if the new code is not faster. There is no way a single bswap on a register will be slower than a byte by byte swap in memory. Do you have an example patch where I could see?

@bnoordhuis Sorry for all the noise, if it's easier I can duck out, just thought it might help. But I agree this is getting a bit excessive : )

trevnorris · 2013-01-15T21:37:11Z

Decided to add my second patch which improves run time of non floating point reads/writes. Mainly just assert check cleanups/improvements.

trevnorris · 2013-01-15T22:22:05Z

@deanm There is a buffer test that SwapBytesAndStore fails:

buffer.writeDoubleBE(2.225073858507201e-308, 0);

It says the value is out of range even though it will return properly and supported by the current Buffer implementation. Also here are the benchmark results with additional data (nA is no Assert):

               bswap            bswap nA         now              now nA           no write nA
writeFloatLE:  558,478 µs       392,237 µs       560,546 µs       407,081 µs       383,255 µs
writeFloatBE:  548,300 µs       396,918 µs       566,947 µs       424,081 µs       390,730 µs
writeDoubleLE: 541,445 µs       397,243 µs       539,089 µs       399,179 µs       381,970 µs
writeDoubleBE: 541,741 µs       417,654 µs       566,575 µs       446,572 µs       389,680 µs

The last column is after having commented out the value store completely and just returning undefined. So we can see the majority of the issue actually comes from running the method against the buffer instance's parent. When you use v8 to performance test what's going on it let's you know the following:

$ ./out/Release/node --trace-deopt --trace-opt --trace-inlining --code-comments benchmark/buffer_write.js
[deoptimize context: 2d20c5a44909]
[marking Buffer.writeDoubleBE 0x2d20c5a6ffd8 for recompilation, reason: small function, ICs with typeinfo: 5/5 (100%)]
Inlined checkOffset called from Buffer.writeDoubleBE.
[optimizing: Buffer.writeDoubleBE / 2d20c5a6ffd9 - took 0.322, 0.655, 0.452 ms]
[marking  0x131604f83c48 for recompilation, reason: small function, ICs with typeinfo: 3/4 (75%)]
Did not inline Buffer.writeDoubleBE called from  (target requires context change).
[optimizing:  / 131604f83c49 - took 0.073, 0.359, 0.240 ms]

When the method is called against the buffer instance's parent it requires a "context change". If these tests are run again directly on the slow buffer and with assert checks off we get the following:

               bswap            now
writeFloatLE:  319,533 µs       317,068 µs
writeFloatBE:  312,304 µs       354,001 µs
writeDoubleLE: 328,422 µs       327,593 µs
writeDoubleBE: 322,739 µs       373,202 µs

So we can see there is ~15% gain on BE writes in the case that the user directly calls the method from the parent and with noAssert. I'm game to use the method, but not until it passes all tests. Until then, there are additional changes I would like to get in before v0.10 is released and that depend on this PR being pulled. So can the addition of those changes please be put off for the future?

Here is the patch I used:

diff --git a/src/node_buffer.cc b/src/node_buffer.cc
index f495247..a68a70b 100644
--- a/src/node_buffer.cc
+++ b/src/node_buffer.cc
@@ -30,6 +30,7 @@
 #include <string.h> // memcpy
 #include <float.h>  // float limits
 #include <math.h>   // infinity
+#include "v8_typed_array_bswap.h"

 #define MIN(a,b) ((a) < (b) ? (a) : (b))

@@ -776,9 +777,15 @@ Handle<Value> WriteFloatGeneric(const Arguments& args) {
       return ThrowRangeError("Trying to write beyond buffer length");
   }

-  memcpy(ptr, &val, sizeof(T));
-  if (ENDIANNESS != is_big_endian())
-    swizzle(ptr, sizeof(T));
+#if V8_TYPED_ARRAY_LITTLE_ENDIAN
+    if (!ENDIANNESS) {
+#else
+    if (ENDIANNESS) {
+#endif
+    v8_typed_array::SwapBytesAndStore<T>(ptr, val);
+  } else {
+    memcpy(ptr, &val, sizeof(T));
+  }

   return Undefined(node_isolate);
 }

trevnorris · 2013-01-16T17:32:19Z

@deanm Can you confirm you're fine with pulling this patch as-is and waiting for a smaller future patch once the discrepancies are worked out?

@bnoordhuis If he can confirm, can we get this applied? Only one other issue has had more comments than this and not hoping to become the first. =)

deanm · 2013-01-16T17:35:08Z

@trevnorris It's not really my decision, but it sounds fine to me anyway.

Improvements: * floating point operations are approx 4x's faster * Now write quiet NaN's * all read/write on floating point now done in C, so no more need for lib/buffer_ieee754.js * float values have more accurate min/max value checks * add additional benchmarks for buffers read/write * created benchmark/_bench_timer.js which is a simple library that can be included into any benchmark and provides an intelligent tracker for sync and async tests * add benchmarks for DataView set methods * add checks and tests to make sure offset is greater than 0

Improved assert check order of execution and added additional checks on parameters to ensure no bad values make it through (e.g. negative offset values).

isaacs · 2013-01-16T18:42:27Z

That's consensus. Landed on 22b84e6 and 7393740

The changes to the file argument of execFile in nodejs#4504 make it appear that execFile requires an absolute or relative path to the executable file, when it also supports a filename which will be resolved using $PATH. Although the example makes this clear, assuming there isn't a node binary in $CWD, it's easy to overlook. This commit clarifies that point. It also updates the argument description for execFileSync to match, since it was overlooked in nodejs#4504 and behaves identically. PR-URL: nodejs/node#5310 Reviewed-By: James M Snell <[email protected]>

bnoordhuis reviewed Jan 4, 2013
View reviewed changes

piscisaureus reviewed Jan 5, 2013
View reviewed changes

trevnorris reviewed Jan 11, 2013
View reviewed changes

src/node_buffer.cc Outdated

Copy link

Author

trevnorris Jan 11, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik this check is safe, and around 15% faster than ->IsUint32().

bnoordhuis reviewed Jan 11, 2013
View reviewed changes

src/node_buffer.cc Outdated

Copy link

Member

bnoordhuis Jan 11, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BooleanValue() here.

This was referenced Jan 15, 2013

TypedArrays: Improve performance when no endian-ness parameter is passed... #4597

Closed

TypedArrays: Improve performance when no endian-ness parameter is passed... #4598

Closed

trevnorris mentioned this pull request Jan 16, 2013

Replace specific Buffer read/write methods #4263

Closed

trevnorris added 2 commits January 16, 2013 09:37

buffer: improve read/write speed with assert

564755b

Improved assert check order of execution and added additional checks on parameters to ensure no bad values make it through (e.g. negative offset values).

isaacs closed this Jan 16, 2013

Uh oh!

buffer: floating point improvements #4504

buffer: floating point improvements #4504

Uh oh!

Conversation

trevnorris commented Jan 1, 2013

Uh oh!

trevnorris commented Jan 1, 2013

Uh oh!

trevnorris commented Jan 4, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trevnorris commented Jan 5, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trevnorris commented Jan 11, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trevnorris commented Jan 11, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trevnorris commented Jan 15, 2013

Uh oh!

deanm commented Jan 15, 2013

Uh oh!

trevnorris commented Jan 15, 2013

Uh oh!

deanm commented Jan 15, 2013

Uh oh!

deanm commented Jan 15, 2013

Uh oh!

trevnorris commented Jan 15, 2013

Uh oh!

deanm commented Jan 15, 2013

Uh oh!

trevnorris commented Jan 15, 2013

Uh oh!

deanm commented Jan 15, 2013

Uh oh!

trevnorris commented Jan 15, 2013

Uh oh!

trevnorris commented Jan 15, 2013

Uh oh!

deanm commented Jan 15, 2013

Uh oh!

bnoordhuis commented Jan 15, 2013

Uh oh!

trevnorris commented Jan 15, 2013

Uh oh!

deanm commented Jan 15, 2013

Uh oh!

trevnorris commented Jan 15, 2013

Uh oh!

trevnorris commented Jan 15, 2013

Uh oh!

trevnorris commented Jan 16, 2013

Uh oh!

deanm commented Jan 16, 2013

Uh oh!

isaacs commented Jan 16, 2013

Uh oh!

Reviewers

Assignees

Labels