-
Notifications
You must be signed in to change notification settings - Fork 413
Implement llfio-based memory-mapped IO for C lib #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
92e322b to
f005a0d
Compare
f005a0d to
9d1f18b
Compare
830b74c to
fac71c9
Compare
|
There is no option to request reviewers and nobody is assigned so I'll just ask: @BurningEnlightenment and @oconnor663 can you review this? Thanks. |
BurningEnlightenment
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting the PR!
I also noticed a few missing includes / reliance on transitive includes like std::move => <utility>, OUTCOME_TRY => <outcome-basic.hpp>, etc.
I haven't looked at the CMake files, yet.
I'll allocate some time next weekend to fix some of the issues.
c/blake3_llfio.cpp
Outdated
| if (result.has_error()) { | ||
| // Explicitly set errno on error since this doesn't always happen | ||
| // automatically on some platforms such as Windows. | ||
| errno = static_cast<int>(result.error().value().sc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llfio leverages NT APIs on Windows, i.e. the status code might be a NTSTATUS and not a Win32 code. This is problematic due to the fact that NTSTATUS and Win32 codes have overlapping value ranges.
One solution would be to convert to HRESULT, but maybe there is a more elegant one. Needs additional thought.
Furthermore llfio might be built by the user with std::error_code, i.e. .sc will produce a compiler error.
And last but not least errno might be dirty, i.e. we should clear it on success or switch the return type from void to bool to indicate success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here in the CMake configuration: https://github.com/silvanshade/BLAKE3/blob/fac71c9e6748e61e0c280e4e905f8f808b3c2ae8/c/dependencies/llfio/CMakeLists.txt#L27-L42
I'm specifically configuring the build with LLFIO_EXPERIMENTAL_STATUS_CODE in order to allow compiling llfio as a header-only library. The documentation states the following:
LLFIO has your choice of header-only, static library, and shared library build modes. Note that on Microsoft Windows, the default header only configuration is unsafe to use outside of toy projects. You will get warnings of the form:
[...]
The cause is that <system_error> has a design flaw where custom error code categories are unsafe when shared libraries are in use.
You have one of three choices here: (i) Use experimental SG14 status_code which doesn't have this problem (define LLFIO_EXPERIMENTAL_STATUS_CODE=1) (ii) Use the NT kernel error category as a shared library (see its documentation) (iii) Don't use header only LLFIO on Windows (see below).
Of the three choices available, the header-only with LLFIO_EXPERIMENTAL_STATUS_CODE seems the cleanest and most portable.
I couldn't find any vendor-provided packages available for llfio for any OS, so relying on it being available pre-compiled doesn't seem like a viable option.
It also doesn't appear that the repository-provided binaries are set up to be able to easily detect them via CMake through the usual means.
Finally, I'd also prefer to avoid compiling it in-tree as a shared lib on Windows if we don't need that.
Furthermore llfio might be built by the user with std::error_code, i.e. .sc will produce a compiler error.
Since llfio is configured as a private dependency and header-only library, I don't see how this can be a problem. As far as I understand, our use of llfio should not interfere with other already existing uses of it elsewhere in the user's code base.
And last but not least errno might be dirty, i.e. we should clear it on success or switch the return type from void to bool to indicate success.
Good point. I would prefer to return a bool rather than set-on-success since it is cleaner and more informative. I will change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since llfio is configured as a private dependency and header-only library, I don't see how this can be a problem. As far as I understand, our use of llfio should not interfere with other already existing uses of it elsewhere in the user's code base.
Sadly, this is not the case. Formally speaking: Using llfio with different configurations in the same project results in One Definition Rule (ODR) violations for basically every entity. ODR violations make the whole program "ill-formed no diagnostic required" (IFNDR), it basically stops being a C++ program from the C++ standard POV.
The practical consequences can be illustrated with the following example: We build blake3 as a static library and llfio in its header variant is used in two configurations like std::error_code and status_code (progam's vs blake3's). All inline functions will be emitted by the compiler into the object file and the linker is allowed to deduplicate them by picking any of them. This means that either our program or the blake3 library will not get ABI incompatible return values.
I couldn't find any vendor-provided packages available for llfio for any OS, so relying on it being available pre-compiled doesn't seem like a viable option.
Indeed, OS packagers are a bad source for C++ libraries (though llfio has been added to the FreeBSD ports tree IIRC). Typically you would use a C++ package manager like vcpkg or conan.
btw. I also maintain llfio and its buildsystem and I'm therefore vaguely familiar with it 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The practical consequences can be illustrated with the following example: We build blake3 as a static library and llfio in its header variant is used in two configurations like
std::error_codeandstatus_code(progam's vs blake3's). Allinlinefunctions will be emitted by the compiler into the object file and the linker is allowed to deduplicate them by picking any of them. This means that either our program or the blake3 library will not get ABI incompatible return values.
Okay, I see how this can be a problem.
Indeed, OS packagers are a bad source for C++ libraries (though llfio has been added to the FreeBSD ports tree IIRC). Typically you would use a C++ package manager like
vcpkgorconan.btw. I also maintain llfio and its buildsystem and I'm therefore vaguely familiar with it 😉
I have some familiarity with vcpkg (not conan though), but to be more specific, I wasn't sure how we should detect local availability of llfio regardless of the source.
If I remember correctly, llfio didn't seem to be set up with a CMake find module definition so we could just use find_package. This is why I checked for the llfio::hl target specifically (after loading from FetchContent).
How would you propose we attempt to detect and use a locally available llfio?
One strategy I am imagining is that if BLAKE3_USE_LLFIO is set, we check for a (user-defined) value for LLFIO_LIBRARY and if defined we attempt to load the value as subdirectory with CMake (which we expect to define the llfio::* targets), otherwise if BLAKE3_FETCH_LLFIO is set, use FetchContent and configure like we have it now, otherwise print an error message about missing llfio.
Furthermore llfio might be built by the user with std::error_code, i.e. .sc will produce a compiler error.
Given the above, how can we detect which configuration is in use in order to handle it appropriately in the different cases? Can you give an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore llfio might be built by the user with std::error_code, i.e. .sc will produce a compiler error.
Given the above, how can we detect which configuration is in use in order to handle it appropriately in the different cases? Can you give an example?
I tried to generalize the implementation to work with either std::error_code or status_code and also to allow specifying whether to use an existing llfio or to fetch and whether to build as header-only (the default) or static or shared.
Unfortunately, I can't seem to get most of the non-header-only configurations to even compile on Windows. I haven't tested that on other platforms yet.
You can try yourself by specifying, for example, -DBLAKE3_USE_LLFIO=1 -DBLAKE3_USE_LLFIO_TARGET=llfio_dl -DBLAKE3_USE_LLFIO_EXPERIMENTAL_STATUS_CODE=OFF -DBLAKE3_FETCH_LLFIO=1.
For non-header-only builds on Windows, the compile fails with several unresolved symbols unless exceptions are enabled, but as far as I understand this should be supported according to ned14/llfio#127.
Another problem is that if EXCLUDE_FROM_ALL is specified in:
FetchContent_Declare(
llfio
GIT_REPOSITORY https://github.com/ned14/llfio
GIT_TAG 20250206
GIT_SHALLOW TRUE
EXCLUDE_FROM_ALL
)or:
add_subdirectory(${BLAKE3_USE_LLFIO_DIR} EXCLUDE_FROM_ALL)then the llfio libs won't be compiled so the example-mmap will just fail when run because the libraries are missing. I would have thought that adding the llfio_dl to the target would force these to compile.
|
Could you give me the high level tradeoffs for LLFIO vs I was playing around with adding support for this feature to the |
@oconnor663 Sorry I missed this earlier but I see you merged the TBB PR. I think that's fine; it's still an improvement to have to multi-threading for C even if the caller has to figure out the memory-mapping for now. I don't have much familiarity with The llfio author (ned14) commented:
The Regarding LLFIO vs Boost ( I believe the intention was also that parts of LLFIO were to be proposed for addition to the C++ standard (more on this later) whereas Boost has often kind of evolved its own way of doing this for various reasons. I would expect LLFIO to be more forward looking in its design. The main reason I chose LLFIO is because it was actively developed and it looked like parts of it were likely to be added to the standard. But I recently saw this comment and it sounds like the author is stepping away from the C++ WG stuff and deprioritizing C++ projects. I'm less sure about whether it's worth the effort to get LLFIO-based memory-mapping to work for the C lib now. Maybe @BurningEnlightenment has some thoughts about all of this.
Yeah, I don't know really. This sounds like maybe more trouble than its worth. If you do everything through CMake and use I have resolved most of the feedback and plan to update the PR still but maybe there's a better option or maybe it's fine to just not have memory-mapping. Perhaps a much simpler approach would be to split the memory-mapping code from Rust into a separate crate which could be statically linked to the C library. |
6669ad8 to
0641838
Compare
|
I've added |
|
The handles created by |
Cool, that seems sufficient I think. The motivating use-case for this PR was to enable memory-mapped IO for the BLAKE3 hashing in nix. But nix is already depending on Boost locally so it's probably simpler to just use I'll go ahead and close this then. Thanks for the feedback. |
|
Cool, thanks for all the work you've done to get this feature in.
Woops, fixed. |
This PR is a follow up to #445 and implements llfio-based memory-mapped IO.
A new example using this functionality has been added as
example-mmap.c.The example can be compiled, installed, and run with the following commands: