Move filename out of Location#2728
Conversation
acbdb25 to
3698161
Compare
sbc100
left a comment
There was a problem hiding this comment.
Seems reasonable, but does it really save much memory? (I don't think memory consumption should be a top priority of wabt BTW)
| // For binary files. | ||
| struct { | ||
| size_t offset; | ||
| bool print_filename; |
There was a problem hiding this comment.
What does this new boolean field do? Why would you ever not want to print the filename?
There was a problem hiding this comment.
This is a very interesting question. This particular constructor sets the filename to an empty string:
https://github.com/WebAssembly/wabt/blob/main/include/wabt/common.h#L220
Several tests has no filename info:
https://github.com/WebAssembly/wabt/blob/main/test/binary/bad-data-drop-no-data-count.txt
For me it feels random.
There was a problem hiding this comment.
The new patch do this without introducing any new members. The ReadBinary function simply uses an empty string as filename when an error is created.
|
It looks like it breaks the emscripten abi. This will not work this way. |
|
This is the steps which might lead to a crash:
Because the Location has a reference to the original filename, we likely get a crash. This could be a security error. It looks like the emscripten abi prevents any changes on this. Is there any way to fix this? Storing the filename in the module as an std::string is an option. |
6841b93 to
7941734
Compare
7941734 to
3538745
Compare
|
I have reworked the patch. The filename is moved to module (and script), and stored only once, instead of storing for every location. The As for future works:
|
sbc100
left a comment
There was a problem hiding this comment.
Why is it any better to keep the string_view in the Error object rather than the Location object.
Logically it seems like a Location should always refer to given file... so this doesn't seems to improve things in that sense.
Perhaps you should detail in the PR description how/when this approach is better?
| (elem (i32.const 0) $f 1)) | ||
| ^ | ||
| 0:0: error: type mismatch in initializer expression, expected [funcref] but got [] | ||
| out/test/parse/module/bad-table-invalid-function.txt:0:0: error: type mismatch in initializer expression, expected [funcref] but got [] |
|
Are there orders of magnitude more Location object constructed than Error objects? I guess that might be a reason to prefer it this way? But also a string_view is only two pointers in size right? |
|
I have updated the description. Errors are frequent in a test system, but real world modules are nearly always valid, so no error objects are created most of the time. Locations are part of nearly all expressions / constructs in WebAssembly, since the validator needs them for throwing errors. This patch makes locations "relative" from "absolute", but it should not be a problem in practice. It is true that a string view is only two pointers, but nothing prevents that the target buffer is not freed. The filename-s can be turned to std::string in a follow up patch, which could prevent freeing buffers. |
|
For example, every |
I think using string_view seems pretty reasonable. The lifetime on the Locations and Errors within a file should not exceed the lifetime of the string holding the filename. We have ASAN/etc to help us be sure of that, right? If this change is just about move the string_view from the Location to the Error in order to save memory it might be nice to justify with some numbers. How much memory are going to be saving for a real-world-sized module? |
|
The filename is not moved from location to error, the error always had a location and that included the filename. Now it is explicit, instead of implicit. I will try to measure memory. |
|
I have processed a 35MByte module with wasm2wat. Biggest consumers: The next entries are more interesting. |
|
Thank you! |
Locations should only contain the line info / binary offset, since the filename is the same everywhere. This patch reduces the memory consumption of locations without loosing any information.