Skip to content

Move filename out of Location#2728

Merged
zherczeg merged 1 commit into
WebAssembly:mainfrom
zherczeg:location_no_file
Mar 26, 2026
Merged

Move filename out of Location#2728
zherczeg merged 1 commit into
WebAssembly:mainfrom
zherczeg:location_no_file

Conversation

@zherczeg
Copy link
Copy Markdown
Collaborator

@zherczeg zherczeg commented Mar 25, 2026

Locations should only contain the line info / binary offset, since the filename is the same everywhere. This patch reduces the memory consumption of locations without loosing any information.

Copy link
Copy Markdown
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable, but does it really save much memory? (I don't think memory consumption should be a top priority of wabt BTW)

Comment thread include/wabt/common.h Outdated
// For binary files.
struct {
size_t offset;
bool print_filename;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this new boolean field do? Why would you ever not want to print the filename?

Copy link
Copy Markdown
Collaborator Author

@zherczeg zherczeg Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very interesting question. This particular constructor sets the filename to an empty string:
https://github.com/WebAssembly/wabt/blob/main/include/wabt/common.h#L220

Several tests has no filename info:
https://github.com/WebAssembly/wabt/blob/main/test/binary/bad-data-drop-no-data-count.txt

For me it feels random.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new patch do this without introducing any new members. The ReadBinary function simply uses an empty string as filename when an error is created.

@zherczeg
Copy link
Copy Markdown
Collaborator Author

It looks like it breaks the emscripten abi. This will not work this way.

@zherczeg
Copy link
Copy Markdown
Collaborator Author

This is the steps which might lead to a crash:

  • Parse the input
  • Free the filename string
  • Validate the module with an error
  • Print error

Because the Location has a reference to the original filename, we likely get a crash. This could be a security error.

It looks like the emscripten abi prevents any changes on this. wabt_validate_module only gets the module, features, errors. The wabt_new_errors has no argument.

Is there any way to fix this? Storing the filename in the module as an std::string is an option.

@zherczeg zherczeg force-pushed the location_no_file branch 2 times, most recently from 6841b93 to 7941734 Compare March 26, 2026 09:53
@zherczeg
Copy link
Copy Markdown
Collaborator Author

I have reworked the patch. The filename is moved to module (and script), and stored only once, instead of storing for every location. The Error array also keeps the filename reference, so its size is unchanged. This way the patch keeps the same ABI as before, and also reduces the memory consumption (unless the module is empty, but I think that is a very rare case).

As for future works:

  • We could add filename info to binary reader. It requires a lot of test rebasing.
  • We could use std::string for errors (should be a low number of errors) and modules/scripts. This could prevent possible crashes. I would not say this is high priority, since library users don't complain so far.

Copy link
Copy Markdown
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it any better to keep the string_view in the Error object rather than the Location object.

Logically it seems like a Location should always refer to given file... so this doesn't seems to improve things in that sense.

Perhaps you should detail in the PR description how/when this approach is better?

(elem (i32.const 0) $f 1))
^
0:0: error: type mismatch in initializer expression, expected [funcref] but got []
out/test/parse/module/bad-table-invalid-function.txt:0:0: error: type mismatch in initializer expression, expected [funcref] but got []
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an improvement.

@sbc100
Copy link
Copy Markdown
Member

sbc100 commented Mar 26, 2026

Are there orders of magnitude more Location object constructed than Error objects? I guess that might be a reason to prefer it this way? But also a string_view is only two pointers in size right?

@zherczeg
Copy link
Copy Markdown
Collaborator Author

I have updated the description. Errors are frequent in a test system, but real world modules are nearly always valid, so no error objects are created most of the time. Locations are part of nearly all expressions / constructs in WebAssembly, since the validator needs them for throwing errors. This patch makes locations "relative" from "absolute", but it should not be a problem in practice. It is true that a string view is only two pointers, but nothing prevents that the target buffer is not freed. The filename-s can be turned to std::string in a follow up patch, which could prevent freeing buffers.

@zherczeg
Copy link
Copy Markdown
Collaborator Author

For example, every Var in IR has a location member:
https://github.com/WebAssembly/wabt/blob/main/include/wabt/ir.h#L96

@sbc100
Copy link
Copy Markdown
Member

sbc100 commented Mar 26, 2026

I have updated the description. Errors are frequent in a test system, but real world modules are nearly always valid, so no error objects are created most of the time. Locations are part of nearly all expressions / constructs in WebAssembly, since the validator needs them for throwing errors. This patch makes locations "relative" from "absolute", but it should not be a problem in practice. It is true that a string view is only two pointers, but nothing prevents that the target buffer is not freed. The filename-s can be turned to std::string in a follow up patch, which could prevent freeing buffers.

I think using string_view seems pretty reasonable. The lifetime on the Locations and Errors within a file should not exceed the lifetime of the string holding the filename. We have ASAN/etc to help us be sure of that, right?

If this change is just about move the string_view from the Location to the Error in order to save memory it might be nice to justify with some numbers. How much memory are going to be saving for a real-world-sized module?

@zherczeg
Copy link
Copy Markdown
Collaborator Author

The filename is not moved from location to error, the error always had a location and that included the filename. Now it is explicit, instead of implicit. I will try to measure memory.

@zherczeg
Copy link
Copy Markdown
Collaborator Author

I have processed a 35MByte module with wasm2wat.
Original: Peak: 150.0Mb
New: Peak: 132.5Mb
I think this is a nice progress.

Biggest consumers:
ProgramMain: wasm2wat.cc:109: Peak: 34.9Mb
OnGenericCustomSection: src/binary-reader-ir.cc:1888: 33.2Mb
These are present in both, and not even relevant from our perspective.

The next entries are more interesting.
BinaryReaderIR::OnLocalGetExpr: src/binary-reader-ir.cc:1077 Peak: 23.8Mb -> 18.2Mb
BinaryReaderIR::OnI32ConstExpr: src/binary-reader-ir.cc:1082 Peak: 11.8Mb -> 9.2Mb
BinaryReaderIR::OnLoadExpr: src/binary-reader-ir.cc:1102 Peak: 6.4Mb -> 5.1Mb
BinaryReaderIR::OnStoreExpr: src/binary-reader-ir.cc:1236 Peak: 5.4Mb -> 4.3Mb
And so on

@zherczeg
Copy link
Copy Markdown
Collaborator Author

Thank you!

@zherczeg zherczeg merged commit 41db004 into WebAssembly:main Mar 26, 2026
17 checks passed
@zherczeg zherczeg deleted the location_no_file branch March 26, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants