Skip to content

Conversation

@voidzcy
Copy link

@voidzcy voidzcy commented Aug 28, 2020

JVM has the optimization for eliminating zeroing when the allocation is followed by System.arraycopy. For codepaths that uses UnsafeUtil.copyMemory the allocation is allocated and the zeroing takes a fair amount of CPU cycles when array size is large.

Java 9+ has jdk.internal.misc.Unsafe.allocateUninitializedArray which does what we want.

Related to gRPC's optimization for passing Iterable<ByteBuffer> into Protobuf.

I've run the gRPC transport benchmark with 1 thread and 2MB response messages on Java 11:

  • Baseline
Benchmark                                        (direct)  (transport)   Mode  Cnt          Score          Error  Units
TransportBenchmark.streamingCallsByteThroughput      true        NETTY  thrpt  100  655102314.930 ± 61603907.509  ops/s
  • With changes in this PR
Benchmark                                        (direct)  (transport)   Mode  Cnt          Score          Error  Units
TransportBenchmark.streamingCallsByteThroughput      true        NETTY  thrpt  100  718053472.436 ± 45752070.587  ops/s

(The full benchmark results are here and here. The baseline commit is 70b0286)

We can see this change gives about 9.6% performance improvement for gRPC running on Java 11.

@voidzcy
Copy link
Author

voidzcy commented Sep 8, 2020

Gently ping @acozzette.

@voidzcy voidzcy force-pushed the impl/use_allocate_uninitialized_array_if_possible branch from efd1792 to ce1b0e3 Compare September 10, 2020 00:31
@voidzcy voidzcy force-pushed the impl/use_allocate_uninitialized_array_if_possible branch from ce1b0e3 to e607436 Compare September 10, 2020 00:42
@voidzcy
Copy link
Author

voidzcy commented Sep 29, 2020

Close as discussions happening internally.

@voidzcy voidzcy closed this Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants