Skip to content
This repository was archived by the owner on Apr 22, 2023. It is now read-only.

Conversation

trevnorris
Copy link

Improvements:

  • floating point operations are approx 3x's faster
  • Now write quiet NaN's
  • all read/write on floating point now done in C, so no more need for
    lib/buffer_ieee754.js
  • float values have more accurate min/max value checks

I would have liked to do all assert checks in C, but unfortunately haven't been able to figure out how to properly check overflow when writing to a fast buffer.

@trevnorris
Copy link
Author

link in comment lost after amending commit: trevnorris/node@c6bda30#commitcomment-2370744

@trevnorris
Copy link
Author

@deanm If you already have the fixes waiting in a pull request, go ahead and point me to them now. But I have one concern. While this may be fixed on the C side with my changes, there is nothing on the js side for all the other read/write methods. Which means inconsistent behavior. imho all those changes would be better for another pull request after this one has been accepted.

@bnoordhuis Mind taking a peek at my changes? I think the "fat" you mentioned on the mailing list has been removed. This is my first C side pull request, so any tips are helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous As<Object> cast; This() returns a Local<Object> (which you may want to cache in a local variable - though hopefully the compiler is smart enough to figure out it's invariant.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, use static_cast when casting from void* to char*.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got reinterpret_cast<char*> from Buffer::Data. Is there a reason it's being used there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, using reinterpret_cast there is wrong (though a minor sin). Updated in 5664dd2.

@trevnorris
Copy link
Author

Ok. I think the changes made encapsulate all the points brought up. And thanks. It feels a lot cleaner now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would make non_native_endian a template argument as well; this forces the compiler to generate separate functions for both cases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have re-written the TypedArray code to have an additional interface for swapping the types, and for a compiler define for little endian/big endian.

#4522

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have written it intentionally so the buffer code can just include v8_typed_array_bswap.h and use the same endian/swapping.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piscisaureus Just to make sure I understand what you're talking about, are these snippets what you mean:

template <typename T, const bool non_native_endian>
Handle<Value> ReadGeneric(const Arguments& args) {
Handle<Value> Buffer::ReadFloatLE(const Arguments& args) {
  return ReadGeneric<float, BE_NATIVE_ARCH>(args);
}

@trevnorris
Copy link
Author

Hold on to this PR. Just found a serious performance regression if no value is passed for noAssert directly to a SlowBuffer. Getting that figured out now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik this check is safe, and around 15% faster than ->IsUint32().

@trevnorris
Copy link
Author

Ok. Performance regressions sorted out. Everything's cleaned up. Added more perf tests. Should be good to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BooleanValue() here.

@trevnorris
Copy link
Author

@bnoordhuis Added an additional file tonight (benchmark/_bench_timer.js) to help with all the benchmarks I've been creating for buffers. It was meant to provide a simple mechanism for time testing sync and async tests. It will run all sync tests in order unless an async is loaded. At which time all sync tests are paused until the former are completed.

Also believe everything has been addressed since the last round of comments. Wasn't sure how to implement a specialized function template like you recommended, so hopefully my solution will do.

@deanm
Copy link

deanm commented Jan 15, 2013

Just to make your life more complicated, v8_typed_array_bswap.h is now committed (along with endian-ness macros and swap routines). I don't really see a reason to duplicate that here since it already exists now in the Node code base. Also there was an integer overflow fix (your current code has the integer overflow bug currently).

I would suggest just taking the new setGeneric and getGeneric functions from v8_typed_array.cc.

But, there is still two pull requests outstanding, so you might want to pull in the changes from those also. Again it's up to you but doesn't seem like a great idea to have copied/pasted the old code when it has now been improved and bug fixed.

@trevnorris
Copy link
Author

@deanm There are several reasons why I've opted to not use the array buffer files:

  • There is no integer overflow bug if the user leaves assert checks on. If they decide to disable the assert checks then they've acknowledged all their read/write parameters will be correct. To demonstrate this fact I've included the same set of unit tests that were included when the bug in array buffers was discovered, on both fast and slow buffers.
  • setGeneric doesn't contain all the checks that previously existed in the old read/write methods.
  • setGeneric doesn't have the option to disable assert checks, which would change the existing buffer api.
  • Because of the additional checks included in my code, if the user leaves on assert then it will perform on par with your implementation. But if they disable assert checks then these perform up to 2x's faster than your implementation. Also there is a massive performance difference between your set LE and BE methods (which is why I said the current implementation is "on par"). I've committed a suit of benchmarks to demonstrate this. Below are my run times:
DataView
setFloat32 - LE: 422,124 µs
setFloat32 - BE: 700,792 µs
setFloat64 - LE: 412,276 µs
setFloat64 - BE: 710,480 µs

Buffer
writeFloatLE:  670,605 µs
writeFloatBE:  686,408 µs
writeDoubleLE: 665,075 µs
writeDoubleBE: 690,582 µs

Buffer - noAssert
writeFloatLE:  310,308 µs
writeFloatBE:  337,131 µs
writeDoubleLE: 318,209 µs
writeDoubleBE: 359,304 µs

@deanm
Copy link

deanm commented Jan 15, 2013

Interesting, I would not expect a difference between LE/BE, in fact I would expect bswaping on the value instead of swizzling on the memory should be an improvement. I would guess it's a difference in the compiler's inlining decision, I will have a look now.

@deanm
Copy link

deanm commented Jan 15, 2013

I just investigated this. The performance difference is not do to the bswap code, but the fact that your benchmark doesn't pass a parameter to big/little endian. Explicitly passing false make the performance actually slightly better than LE.

Therefore I would expect that using the bswap routines you would see better performance for your BE writes.

In the meantime I will look at my BooleanValue() is slow on the undefined parameter and what I can due to speed it up. I imagine it's partly because I am indexing outside of Arguments[] and then calling BooleanValue on the Undefined value it has to return.

Can you try your test again with ca0d2a0b31f23f02590c54b7145db9c696d0be06 applied?

PS: http://code.google.com/p/v8/issues/detail?id=2491

@trevnorris
Copy link
Author

@deanm The noAssert BooleanValue problem is actually a bug in v8. I've reported the bug and is a known issue. This is why you'll see in my changes to lib/buffer.js I pass !!noAssert (thanks @bnoordhuis). For the fastest work around do the following: bool little_endian = args[2]->IsTrue() ? true : false;

I'll give bswap a run through the benchmarks and report back.

@deanm
Copy link

deanm commented Jan 15, 2013

@trevnorris I've made the change in ca0d2a0 (I don't see how !!noAssert applies here?)

@trevnorris
Copy link
Author

@deanm It applies since !!noAssert forces undefined to false before the argument is passed to the method. This is how I got around the performance hit using BooleanValue()

@deanm
Copy link

deanm commented Jan 15, 2013

@trevnorris yes but there is no JS layer for typed arrays

@trevnorris
Copy link
Author

@deanm I know that my solution didn't apply to your situation. Was just demonstrating that I encountered the same problem and how I got around it.

@trevnorris
Copy link
Author

@deanm I benchmarked the use of v8_typed_array::SwapBytesAndStore<T>(ptr, val); instead of swizzle. The results are comparable in the normal use case:

               bswap          current
writeFloatLE:  565,043 µs     576,468 µs
writeFloatBE:  552,146 µs     567,918 µs
writeDoubleLE: 526,411 µs     544,727 µs
writeDoubleBE: 544,566 µs     567,811 µs

But again that's because the current implementation has additional checks (e.g. proper bounds checking on floats). So if those checks are disabled:

               bswap          current
writeFloatLE:  500,153 µs     379,895 µs
writeFloatBE:  510,261 µs     425,825 µs
writeDoubleLE: 432,707 µs     409,362 µs
writeDoubleBE: 427,186 µs     442,892 µs

The performance difference for bswap is that right now some checks must be performed in javascript. I can't test against the current data view implementation since doesn't support the ability to disable assert checks. Which would be a change to the Buffer api.

@deanm
Copy link

deanm commented Jan 15, 2013

Can't you just use the bswap portion of the code without the checks? Meaning just changing the swizzle portion?

@bnoordhuis
Copy link
Member

@deanm @trevnorris Can you guys work out between the two of you what patches should go in what order? Bonus points if you wrap it up in a single PR. :-)

Dean, you may want to come hang out in #libuv on freenode.org - might make communication a little easier / quicker.

@trevnorris
Copy link
Author

That's what the benchmarks above displays (having just swapped out the swizzle portion).

@deanm
Copy link

deanm commented Jan 15, 2013

@trevnorris I would be really surprised if the new code is not faster. There is no way a single bswap on a register will be slower than a byte by byte swap in memory. Do you have an example patch where I could see?

@bnoordhuis Sorry for all the noise, if it's easier I can duck out, just thought it might help. But I agree this is getting a bit excessive : )

@trevnorris
Copy link
Author

Decided to add my second patch which improves run time of non floating point reads/writes. Mainly just assert check cleanups/improvements.

@trevnorris
Copy link
Author

@deanm There is a buffer test that SwapBytesAndStore fails:

buffer.writeDoubleBE(2.225073858507201e-308, 0);

It says the value is out of range even though it will return properly and supported by the current Buffer implementation. Also here are the benchmark results with additional data (nA is no Assert):

               bswap            bswap nA         now              now nA           no write nA
writeFloatLE:  558,478 µs       392,237 µs       560,546 µs       407,081 µs       383,255 µs
writeFloatBE:  548,300 µs       396,918 µs       566,947 µs       424,081 µs       390,730 µs
writeDoubleLE: 541,445 µs       397,243 µs       539,089 µs       399,179 µs       381,970 µs
writeDoubleBE: 541,741 µs       417,654 µs       566,575 µs       446,572 µs       389,680 µs

The last column is after having commented out the value store completely and just returning undefined. So we can see the majority of the issue actually comes from running the method against the buffer instance's parent. When you use v8 to performance test what's going on it let's you know the following:

$ ./out/Release/node --trace-deopt --trace-opt --trace-inlining --code-comments benchmark/buffer_write.js
[deoptimize context: 2d20c5a44909]
[marking Buffer.writeDoubleBE 0x2d20c5a6ffd8 for recompilation, reason: small function, ICs with typeinfo: 5/5 (100%)]
Inlined checkOffset called from Buffer.writeDoubleBE.
[optimizing: Buffer.writeDoubleBE / 2d20c5a6ffd9 - took 0.322, 0.655, 0.452 ms]
[marking  0x131604f83c48 for recompilation, reason: small function, ICs with typeinfo: 3/4 (75%)]
Did not inline Buffer.writeDoubleBE called from  (target requires context change).
[optimizing:  / 131604f83c49 - took 0.073, 0.359, 0.240 ms]

When the method is called against the buffer instance's parent it requires a "context change". If these tests are run again directly on the slow buffer and with assert checks off we get the following:

               bswap            now
writeFloatLE:  319,533 µs       317,068 µs
writeFloatBE:  312,304 µs       354,001 µs
writeDoubleLE: 328,422 µs       327,593 µs
writeDoubleBE: 322,739 µs       373,202 µs

So we can see there is ~15% gain on BE writes in the case that the user directly calls the method from the parent and with noAssert. I'm game to use the method, but not until it passes all tests. Until then, there are additional changes I would like to get in before v0.10 is released and that depend on this PR being pulled. So can the addition of those changes please be put off for the future?

Here is the patch I used:

diff --git a/src/node_buffer.cc b/src/node_buffer.cc
index f495247..a68a70b 100644
--- a/src/node_buffer.cc
+++ b/src/node_buffer.cc
@@ -30,6 +30,7 @@
 #include <string.h> // memcpy
 #include <float.h>  // float limits
 #include <math.h>   // infinity
+#include "v8_typed_array_bswap.h"

 #define MIN(a,b) ((a) < (b) ? (a) : (b))

@@ -776,9 +777,15 @@ Handle<Value> WriteFloatGeneric(const Arguments& args) {
       return ThrowRangeError("Trying to write beyond buffer length");
   }

-  memcpy(ptr, &val, sizeof(T));
-  if (ENDIANNESS != is_big_endian())
-    swizzle(ptr, sizeof(T));
+#if V8_TYPED_ARRAY_LITTLE_ENDIAN
+    if (!ENDIANNESS) {
+#else
+    if (ENDIANNESS) {
+#endif
+    v8_typed_array::SwapBytesAndStore<T>(ptr, val);
+  } else {
+    memcpy(ptr, &val, sizeof(T));
+  }

   return Undefined(node_isolate);
 }

@trevnorris
Copy link
Author

@deanm Can you confirm you're fine with pulling this patch as-is and waiting for a smaller future patch once the discrepancies are worked out?

@bnoordhuis If he can confirm, can we get this applied? Only one other issue has had more comments than this and not hoping to become the first. =)

@deanm
Copy link

deanm commented Jan 16, 2013

@trevnorris It's not really my decision, but it sounds fine to me anyway.

Improvements:
* floating point operations are approx 4x's faster
* Now write quiet NaN's
* all read/write on floating point now done in C, so no more need for
  lib/buffer_ieee754.js
* float values have more accurate min/max value checks
* add additional benchmarks for buffers read/write
* created benchmark/_bench_timer.js which is a simple library that
  can be included into any benchmark and provides an intelligent tracker
  for sync and async tests
* add benchmarks for DataView set methods
* add checks and tests to make sure offset is greater than 0
Improved assert check order of execution and added additional checks on
parameters to ensure no bad values make it through (e.g. negative offset
values).
@isaacs
Copy link

isaacs commented Jan 16, 2013

That's consensus. Landed on 22b84e6 and 7393740

@isaacs isaacs closed this Jan 16, 2013
richardlau pushed a commit to ibmruntimes/node that referenced this pull request Feb 23, 2016
The changes to the file argument of execFile in nodejs#4504 make it appear
that execFile requires an absolute or relative path to the executable
file, when it also supports a filename which will be resolved using
$PATH.  Although the example makes this clear, assuming there isn't a
node binary in $CWD, it's easy to overlook.  This commit clarifies that
point.

It also updates the argument description for execFileSync to match,
since it was overlooked in nodejs#4504 and behaves identically.

PR-URL: nodejs/node#5310
Reviewed-By: James M Snell <[email protected]>
richardlau pushed a commit to ibmruntimes/node that referenced this pull request Feb 23, 2016
The changes to the file argument of execFile in nodejs#4504 make it appear
that execFile requires an absolute or relative path to the executable
file, when it also supports a filename which will be resolved using
$PATH.  Although the example makes this clear, assuming there isn't a
node binary in $CWD, it's easy to overlook.  This commit clarifies that
point.

It also updates the argument description for execFileSync to match,
since it was overlooked in nodejs#4504 and behaves identically.

PR-URL: nodejs/node#5310
Reviewed-By: James M Snell <[email protected]>
richardlau pushed a commit to ibmruntimes/node that referenced this pull request Feb 29, 2016
The changes to the file argument of execFile in nodejs#4504 make it appear
that execFile requires an absolute or relative path to the executable
file, when it also supports a filename which will be resolved using
$PATH.  Although the example makes this clear, assuming there isn't a
node binary in $CWD, it's easy to overlook.  This commit clarifies that
point.

It also updates the argument description for execFileSync to match,
since it was overlooked in nodejs#4504 and behaves identically.

PR-URL: nodejs/node#5310
Reviewed-By: James M Snell <[email protected]>
richardlau pushed a commit to ibmruntimes/node that referenced this pull request Mar 9, 2016
The changes to the file argument of execFile in nodejs#4504 make it appear
that execFile requires an absolute or relative path to the executable
file, when it also supports a filename which will be resolved using
$PATH.  Although the example makes this clear, assuming there isn't a
node binary in $CWD, it's easy to overlook.  This commit clarifies that
point.

It also updates the argument description for execFileSync to match,
since it was overlooked in nodejs#4504 and behaves identically.

PR-URL: nodejs/node#5310
Reviewed-By: James M Snell <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants