Skip to content

Conversation

@rzikm
Copy link
Member

@rzikm rzikm commented Sep 11, 2025

Implements #59591

Ignore native code in src/native/external/zstd (verbatim copy of latest zstd release, included in the very first commit)

There are some patches applied to vendored zstd sources which are listed in src/native/external, neither of them should be necessary once we update to next release (maintainers didn't commit to a release date, but hopefully they will release something early next year).

Rest of the implementation mirrors the approach done for Brotli.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be upstreamed. You can link the PR in -version.txt file (see libunwind-version.txt). We've avoided adding patch files like these to the repo in the past.

Comment on lines +14 to +15
internal SafeZstdCDictHandle? _dictionary;
internal MemoryHandle? _prefixHandle;
Copy link
Member

@ericstj ericstj Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unusual to see SafeHandles holding anything more than just the handle/pointer. What's the purpose of these fields, and how do they get used for interop? Same comment applies to all types in this file.

Typically the pattern for a safehandle is something very simple -- some IntPtr that represents data that has a special method required for closing. You only have the SafeHandle responsible for that data. If you have multiple different handles like this, create one for each unique type that has it's unique close method. If you need to group handles together, consider a higher level type that aggregates the handles rather than the handles themselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ZstandardDictionaries, we share the underlying byte array between ZSTD_DDict and ZSTD_CDict to save some memory, so we need to ensure that the array will outlive both of these object, the way this is achieved is by each (C|D)Dict object holding onto its own GCHandle for the byte array that keeps it alive at least until we free the native objects. We have been using using similar pattern to keep some data alive in SslStream safe handles.

Similar story is happening for MemoryHandles stored in SafeZstd(Dec|C)ompressHandle, there the _prefixHandle is coming from SetPrefix method. Here we need to make sure the ReadOnlyMemory passed to SetPrefix does not move/deallocate while Encoder/Decoder is still referencing it.

That being said, I am not 100% sure if disposing MemoryHandle from ReleaseHandle is valid, as MemoryHandle may reference other Managed objects (IPinable) that may(?) be finalizable. But at the same time, the MemoryManager implementations we have seem to be resilient against this case....

Copy link
Member

@ericstj ericstj Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd want to get an interop/GC expert to weigh in here. These sort of cases are hard to test and prove out and easy to miss some nuance in how they work. It's better if we can stick to some hard-fast rules which are known to be safe.

For ZstandardDictionaries, we share the underlying byte array between ZSTD_DDict and ZSTD_CDict to save some memory, so we need to ensure that the array will outlive both of these object,

Can't you make the wrapping managed object be responsible for that lifetime, so that it would ref-count both the handle and the byte array? Make the safe-handles only responsible for the zstandard-owned objects?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you make the wrapping managed object be responsible for that lifetime, so that it would ref-count both the handle and the byte array? Make the safe-handles only responsible for the zstandard-owned objects?

I think that does not solve the problem when somebody disposes e.g. Encoder while the compression is in progress. The ZSTD_CCTX handle would survive due to refcounting around P/Invoke stubs, but we would drop the GC handle for the Prefix and allow GC to move it around

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd want to get an interop/GC expert to weigh in here. These sort of cases are hard to test and prove out and easy to miss some nuance in how they work. It's better if we can stick to some hard-fast rules which are known to be safe.

I'd feel more confident as well, but I am not sure who would be the best person to ask.

/// </summary>
/// <param name="size">The size of the source data in bytes.</param>
/// <remarks>
/// This method must be called before writing data to the stream.
Copy link
Member

@ericstj ericstj Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, if this always must be set for compression, should we make it a constructor parameter? Requiring up-front knowledge of size seems to limit possible streaming scenarios.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not need to be set, you can stream data without needing to know the length upfront, but then the uncompressed length is not available in the header.

If you want the compressed data to contain length information in the header, then you must specify it before you start compressing data.

We chose not to put it in the options/ctor parameter, because you can reuse options for multiple Encoders (as well as the encoder itself via Reset())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case please improve the remarks so that they don't use "must", or qualify it to make the consequences clear. Describe what happens if set after writing data, or if never setting at all.

<UseSystemZlib Condition="'$(UseSystemZlib)' == '' and !Exists('$(IlcFrameworkNativePath)libz.a')">true</UseSystemZlib>
<!-- Use libbrotlicommon.a as the sentinel for the three brotli libs. -->
<UseSystemBrotli Condition="'$(UseSystemBrotli)' == '' and !Exists('$(IlcFrameworkNativePath)libbrotlicommon.a')">true</UseSystemBrotli>
<UseSystemZstd Condition="'$(UseSystemZstd)' == '' and !Exists('$(IlcFrameworkNativePath)libzstd.a')">true</UseSystemZstd>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UseSystemXX is paired with CLR_CMAKE_USE_SYSTEM_XX support for distros to use system installed lib. That typically happens when we are confident that distro provided version is working on major distros like Fedora and Ubuntu. See git grep CLR_CMAKE_USE_SYSTEM_BROTLI. In cases where we have many functionality-related patches to make the lib work (like llvm-libunwind has tons of local / hard-to-upstream patches: #72655), distros cannot use the system provided variant and we use the regular in-tree version.

TODOs:

  • Add CLR_CMAKE_USE_SYSTEM_ZSTD support in src/native/libs and VMR's repo-projects/runtime.proj (next to CLR_CMAKE_USE_SYSTEM_BROTLI)
  • Test with system provided libs at least on fedora and ubuntu
  • If the test fails, remove UseSystemZstd in NAOT buildintegration and file an issue to follow up
  • Otherwise, check-in those (CLR_CMAKE_USE_SYSTEM_ZSTD) changes.

@rzikm
Copy link
Member Author

rzikm commented Nov 27, 2025

Edit: seem to be fixed now

There seems to be some problems with linux x86 build that I am unable to debug on my own.

The build fails because it fails to copy libzstd.a to the coreclr build output, because it attempts to do so before the library is built. The weird part for me is that on all other builds that I checked, the "Installing: ...." steps are happening at the very end of the build, but for this particular pipeline they are happening sometime in the middle, see e.g. https://dev.azure.com/dnceng-public/public/_build/results?buildId=1214394&view=logs&jobId=d82b64b5-3dbf-5e48-910e-ffa3eb07d628&j=d82b64b5-3dbf-5e48-910e-ffa3eb07d628&t=3aaa1282-8fdd-59be-7f23-95b71c611ad2


  -- Install configuration: "CHECKED"
  -- Installing: /__w/1/s/artifacts/bin/coreclr/linux.x86.Checked/./watchdog
  -- Installing: /__w/1/s/artifacts/bin/coreclr/linux.x86.Checked/./watchdog.dbg
  -- Installing: /__w/1/s/artifacts/bin/coreclr/linux.x86.Checked/./corerun
  [ 35%] Built target hosts
  [ 35%] Building C object _deps/zstd-build/lib/CMakeFiles/libzstd_static.dir/__w/1/s/src/native/external/zstd/lib/legacy/zstd_v05.c.o
  [ 35%] Built target eventing_headers
  [ 35%] Building C object _deps/zstd-build/lib/CMakeFiles/libzstd_static.dir/__w/1/s/src/native/external/zstd/lib/legacy/zstd_v06.c.o
  [ 35%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/cluster.c.o
  -- Install configuration: "CHECKED"
  -- Installing: /__w/1/s/artifacts/bin/coreclr/linux.x86.Checked/lib/cmake/zstd/zstdTargets.cmake
  -- Installing: /__w/1/s/artifacts/bin/coreclr/linux.x86.Checked/lib/cmake/zstd/zstdTargets-checked.cmake
  -- Installing: /__w/1/s/artifacts/bin/coreclr/linux.x86.Checked/lib/cmake/zstd/zstdConfig.cmake
  -- Installing: /__w/1/s/artifacts/bin/coreclr/linux.x86.Checked/lib/cmake/zstd/zstdConfigVersion.cmake
  -- Installing: /__w/1/s/artifacts/bin/coreclr/linux.x86.Checked/lib/pkgconfig/libzstd.pc
  -- Installing: /__w/1/s/artifacts/bin/coreclr/linux.x86.Checked/include/zdict.h
  -- Installing: /__w/1/s/artifacts/bin/coreclr/linux.x86.Checked/include/zstd.h
  -- Installing: /__w/1/s/artifacts/bin/coreclr/linux.x86.Checked/include/zstd_errors.h
  CMake Error at _deps/zstd-build/lib/cmake_install.cmake:58 (file):
    file INSTALL cannot find
    "/__w/1/s/artifacts/obj/coreclr/linux.x86.Checked/_deps/zstd-build/lib/libzstd.a":
    No such file or directory.
  Call Stack (most recent call first):
    _deps/zstd-build/cmake_install.cmake:77 (include)
    libs-native/System.IO.Compression.Native/cmake_install.cmake:57 (include)
    libs-native/cmake_install.cmake:47 (include)
    cmake_install.cmake:52 (include)
  
  
  make[3]: *** [CMakeFiles/coreclr_misc.dir/build.make:70: CMakeFiles/coreclr_misc] Error 1
  make[2]: *** [CMakeFiles/Makefile2:1985: CMakeFiles/coreclr_misc.dir/all] Error 2
  make[2]: *** Waiting for unfinished jobs....
  [ 35%] Building C object _deps/zstd-build/lib/CMakeFiles/libzstd_static.dir/__w/1/s/src/native/external/zstd/lib/legacy/zstd_v07.c.o
  [ 35%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/command.c.o
  [ 35%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/compound_dictionary.c.o
  [ 35%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/compress_fragment.c.o
  [ 35%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/compress_fragment_two_pass.c.o
  [ 35%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/dictionary_hash.c.o
  [ 37%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/encode.c.o
  [ 37%] Linking C static library libzstd.a
  [ 37%] Built target libzstd_static
  [ 37%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/encoder_dict.c.o
  [ 37%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/entropy_encode.c.o
  [ 37%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/fast_log.c.o
  [ 37%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/histogram.c.o
  [ 37%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/literal_cost.c.o
  [ 37%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/memory.c.o
  [ 37%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/metablock.c.o
  [ 37%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/static_dict.c.o
  [ 37%] Building C object _deps/brotli-build/CMakeFiles/brotlienc.dir/c/enc/utf8_util.c.o
  [ 37%] Linking C static library libbrotlienc.a
  [ 37%] Built target brotlienc
  make[1]: *** [CMakeFiles/Makefile2:1798: CMakeFiles/runtime.dir/rule] Error 2
  make: *** [Makefile:241: runtime] Error 2
  /__w/1/s/src/coreclr

I think I have exhausted the more reasonable debugging Ideas, Can somebody assist here? Maybe @jkoritzinsky? Your name pops up a lot in CMakeLists git blame.

Comment on lines +23 to +35
set_property(TARGET libzstd_static APPEND PROPERTY COMPILE_DEFINITIONS "ZSTD_DLL_EXPORT=1")
endif()

if (ANDROID)
# qsort_r is not available during android build and zstd's autodetection does not seem
# to handle this case correctly
# This should no longer be needed once we update to zstd 1.5.8 in the future
set_property(TARGET libzstd_static APPEND PROPERTY COMPILE_DEFINITIONS "ZSTD_USE_C90_QSORT=1")
endif()

# disable warnings that occur in the zstd library
target_compile_options(libzstd_static PRIVATE $<$<COMPILE_LANG_AND_ID:C,MSVC>:/wd4242>)
target_compile_options(libzstd_static PRIVATE $<$<COMPILE_LANG_AND_ID:C,Clang,AppleClang,GNU>:-Wno-implicit-fallthrough>) No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't adding lib prefix in other places:

set_target_properties(brotli PROPERTIES EXCLUDE_FROM_ALL ON)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are referring to the target name (libzstd*), then those names are coming from zstd cmakefiles which we are not changing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is our cmake, I meant we should use target_compile_options(zstd_static PRIVATE $<$<COMPILE_LANG_AND_ID:C,Clang,AppleClang,GNU>:-Wno-implicit-fallthrough>) instead of libzstd_static etc. cmake automatically uses platform convention: libzstd_static on Unix and zstd_static on Windows.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove the lib prefix, I get

 CMake Error at /source/dotnet/runtime/src/native/external/zstd.cmake:34 (target_compile_options):
    Cannot specify compile options for target "zstd_static" which is not built
    by this project.
  Call Stack (most recent call first):
    System.IO.Compression.Native/CMakeLists.txt:24 (include)

so I think we need to keep it as is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looks good. This is a problem with FetchContent approach that we have to inherit poorly authored config conventions. Projects which don't use this mechanism are libunwind and llvm-libunwind, our cmake for them are completely independent and we select exactly which files to compile and inherit our compiler configs out of eng/native.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants