-
-
Notifications
You must be signed in to change notification settings - Fork 7.3k
buffer: floating point improvements #4504
Conversation
link in comment lost after amending commit: trevnorris/node@c6bda30#commitcomment-2370744 |
@deanm If you already have the fixes waiting in a pull request, go ahead and point me to them now. But I have one concern. While this may be fixed on the C side with my changes, there is nothing on the js side for all the other read/write methods. Which means inconsistent behavior. imho all those changes would be better for another pull request after this one has been accepted. @bnoordhuis Mind taking a peek at my changes? I think the "fat" you mentioned on the mailing list has been removed. This is my first C side pull request, so any tips are helpful. |
src/node_buffer.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superfluous As<Object>
cast; This() returns a Local<Object>
(which you may want to cache in a local variable - though hopefully the compiler is smart enough to figure out it's invariant.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, use static_cast when casting from void* to char*.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got reinterpret_cast<char*>
from Buffer::Data
. Is there a reason it's being used there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, using reinterpret_cast there is wrong (though a minor sin). Updated in 5664dd2.
Ok. I think the changes made encapsulate all the points brought up. And thanks. It feels a lot cleaner now. |
src/node_buffer.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would make non_native_endian
a template argument as well; this forces the compiler to generate separate functions for both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have re-written the TypedArray code to have an additional interface for swapping the types, and for a compiler define for little endian/big endian.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have written it intentionally so the buffer code can just include v8_typed_array_bswap.h and use the same endian/swapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piscisaureus Just to make sure I understand what you're talking about, are these snippets what you mean:
template <typename T, const bool non_native_endian>
Handle<Value> ReadGeneric(const Arguments& args) {
Handle<Value> Buffer::ReadFloatLE(const Arguments& args) {
return ReadGeneric<float, BE_NATIVE_ARCH>(args);
}
Hold on to this PR. Just found a serious performance regression if no value is passed for noAssert directly to a SlowBuffer. Getting that figured out now. |
src/node_buffer.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik this check is safe, and around 15% faster than ->IsUint32()
.
Ok. Performance regressions sorted out. Everything's cleaned up. Added more perf tests. Should be good to go. |
src/node_buffer.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use BooleanValue() here.
@bnoordhuis Added an additional file tonight ( Also believe everything has been addressed since the last round of comments. Wasn't sure how to implement a specialized function template like you recommended, so hopefully my solution will do. |
Just to make your life more complicated, v8_typed_array_bswap.h is now committed (along with endian-ness macros and swap routines). I don't really see a reason to duplicate that here since it already exists now in the Node code base. Also there was an integer overflow fix (your current code has the integer overflow bug currently). I would suggest just taking the new setGeneric and getGeneric functions from v8_typed_array.cc. But, there is still two pull requests outstanding, so you might want to pull in the changes from those also. Again it's up to you but doesn't seem like a great idea to have copied/pasted the old code when it has now been improved and bug fixed. |
@deanm There are several reasons why I've opted to not use the array buffer files:
|
Interesting, I would not expect a difference between LE/BE, in fact I would expect bswaping on the value instead of swizzling on the memory should be an improvement. I would guess it's a difference in the compiler's inlining decision, I will have a look now. |
I just investigated this. The performance difference is not do to the bswap code, but the fact that your benchmark doesn't pass a parameter to big/little endian. Explicitly passing false make the performance actually slightly better than LE. Therefore I would expect that using the bswap routines you would see better performance for your BE writes. In the meantime I will look at my BooleanValue() is slow on the undefined parameter and what I can due to speed it up. I imagine it's partly because I am indexing outside of Arguments[] and then calling BooleanValue on the Undefined value it has to return. Can you try your test again with ca0d2a0b31f23f02590c54b7145db9c696d0be06 applied? |
@deanm The noAssert BooleanValue problem is actually a bug in v8. I've reported the bug and is a known issue. This is why you'll see in my changes to I'll give bswap a run through the benchmarks and report back. |
@trevnorris I've made the change in ca0d2a0 (I don't see how !!noAssert applies here?) |
@deanm It applies since |
@trevnorris yes but there is no JS layer for typed arrays |
@deanm I know that my solution didn't apply to your situation. Was just demonstrating that I encountered the same problem and how I got around it. |
@deanm I benchmarked the use of
But again that's because the current implementation has additional checks (e.g. proper bounds checking on floats). So if those checks are disabled:
The performance difference for bswap is that right now some checks must be performed in javascript. I can't test against the current data view implementation since doesn't support the ability to disable assert checks. Which would be a change to the Buffer api. |
Can't you just use the bswap portion of the code without the checks? Meaning just changing the swizzle portion? |
@deanm @trevnorris Can you guys work out between the two of you what patches should go in what order? Bonus points if you wrap it up in a single PR. :-) Dean, you may want to come hang out in #libuv on freenode.org - might make communication a little easier / quicker. |
That's what the benchmarks above displays (having just swapped out the swizzle portion). |
@trevnorris I would be really surprised if the new code is not faster. There is no way a single bswap on a register will be slower than a byte by byte swap in memory. Do you have an example patch where I could see? @bnoordhuis Sorry for all the noise, if it's easier I can duck out, just thought it might help. But I agree this is getting a bit excessive : ) |
Decided to add my second patch which improves run time of non floating point reads/writes. Mainly just assert check cleanups/improvements. |
@deanm There is a buffer test that
It says the value is out of range even though it will return properly and supported by the current Buffer implementation. Also here are the benchmark results with additional data (nA is no Assert):
The last column is after having commented out the value store completely and just returning undefined. So we can see the majority of the issue actually comes from running the method against the buffer instance's parent. When you use v8 to performance test what's going on it let's you know the following:
When the method is called against the buffer instance's parent it requires a "context change". If these tests are run again directly on the slow buffer and with assert checks off we get the following:
So we can see there is ~15% gain on BE writes in the case that the user directly calls the method from the parent and with noAssert. I'm game to use the method, but not until it passes all tests. Until then, there are additional changes I would like to get in before v0.10 is released and that depend on this PR being pulled. So can the addition of those changes please be put off for the future? Here is the patch I used: diff --git a/src/node_buffer.cc b/src/node_buffer.cc
index f495247..a68a70b 100644
--- a/src/node_buffer.cc
+++ b/src/node_buffer.cc
@@ -30,6 +30,7 @@
#include <string.h> // memcpy
#include <float.h> // float limits
#include <math.h> // infinity
+#include "v8_typed_array_bswap.h"
#define MIN(a,b) ((a) < (b) ? (a) : (b))
@@ -776,9 +777,15 @@ Handle<Value> WriteFloatGeneric(const Arguments& args) {
return ThrowRangeError("Trying to write beyond buffer length");
}
- memcpy(ptr, &val, sizeof(T));
- if (ENDIANNESS != is_big_endian())
- swizzle(ptr, sizeof(T));
+#if V8_TYPED_ARRAY_LITTLE_ENDIAN
+ if (!ENDIANNESS) {
+#else
+ if (ENDIANNESS) {
+#endif
+ v8_typed_array::SwapBytesAndStore<T>(ptr, val);
+ } else {
+ memcpy(ptr, &val, sizeof(T));
+ }
return Undefined(node_isolate);
} |
@deanm Can you confirm you're fine with pulling this patch as-is and waiting for a smaller future patch once the discrepancies are worked out? @bnoordhuis If he can confirm, can we get this applied? Only one other issue has had more comments than this and not hoping to become the first. =) |
@trevnorris It's not really my decision, but it sounds fine to me anyway. |
Improvements: * floating point operations are approx 4x's faster * Now write quiet NaN's * all read/write on floating point now done in C, so no more need for lib/buffer_ieee754.js * float values have more accurate min/max value checks * add additional benchmarks for buffers read/write * created benchmark/_bench_timer.js which is a simple library that can be included into any benchmark and provides an intelligent tracker for sync and async tests * add benchmarks for DataView set methods * add checks and tests to make sure offset is greater than 0
Improved assert check order of execution and added additional checks on parameters to ensure no bad values make it through (e.g. negative offset values).
The changes to the file argument of execFile in nodejs#4504 make it appear that execFile requires an absolute or relative path to the executable file, when it also supports a filename which will be resolved using $PATH. Although the example makes this clear, assuming there isn't a node binary in $CWD, it's easy to overlook. This commit clarifies that point. It also updates the argument description for execFileSync to match, since it was overlooked in nodejs#4504 and behaves identically. PR-URL: nodejs/node#5310 Reviewed-By: James M Snell <[email protected]>
The changes to the file argument of execFile in nodejs#4504 make it appear that execFile requires an absolute or relative path to the executable file, when it also supports a filename which will be resolved using $PATH. Although the example makes this clear, assuming there isn't a node binary in $CWD, it's easy to overlook. This commit clarifies that point. It also updates the argument description for execFileSync to match, since it was overlooked in nodejs#4504 and behaves identically. PR-URL: nodejs/node#5310 Reviewed-By: James M Snell <[email protected]>
The changes to the file argument of execFile in nodejs#4504 make it appear that execFile requires an absolute or relative path to the executable file, when it also supports a filename which will be resolved using $PATH. Although the example makes this clear, assuming there isn't a node binary in $CWD, it's easy to overlook. This commit clarifies that point. It also updates the argument description for execFileSync to match, since it was overlooked in nodejs#4504 and behaves identically. PR-URL: nodejs/node#5310 Reviewed-By: James M Snell <[email protected]>
The changes to the file argument of execFile in nodejs#4504 make it appear that execFile requires an absolute or relative path to the executable file, when it also supports a filename which will be resolved using $PATH. Although the example makes this clear, assuming there isn't a node binary in $CWD, it's easy to overlook. This commit clarifies that point. It also updates the argument description for execFileSync to match, since it was overlooked in nodejs#4504 and behaves identically. PR-URL: nodejs/node#5310 Reviewed-By: James M Snell <[email protected]>
Improvements:
lib/buffer_ieee754.js
I would have liked to do all assert checks in C, but unfortunately haven't been able to figure out how to properly check overflow when writing to a fast buffer.