Skip to content

Conversation

@lbloder
Copy link
Collaborator

@lbloder lbloder commented Aug 29, 2025

📜 Description

Follow-up of: #4576

Fixes comments from previous PRs
Fixes method timing sent to sentry by not aggregating JFR events
Fixes RateLimiter Handling for profile_chunk and profile_chunk_ui

💡 Motivation and Context

💚 How did you test it?

📝 Checklist

  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • I updated the wizard if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

  • Add option to filter threads in converter to reduce payload and data that needs to be handled in the backend

@github-actions
Copy link
Contributor

🚨 Detected changes in high risk code 🚨

High-risk code has higher potential to break the SDK and may be hard to test. To prevent severe bugs, apply the rollout process for releasing such changes and be extra careful when changing and reviewing these files:

  • sentry/src/main/java/io/sentry/transport/RateLimiter.java

@github-actions
Copy link
Contributor

github-actions bot commented Aug 29, 2025

Fails
🚫 Please consider adding a changelog entry for the next release.
Messages
📖 Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Instructions and example for changelog

Please add an entry to CHANGELOG.md to the "Unreleased" section. Make sure the entry includes this PR's number.

Example:

## Unreleased

- Profiling - Deduplication and cleanup ([#4681](https://github.com/getsentry/sentry-java/pull/4681))

If none of the above apply, you can opt out of this check by adding #skip-changelog to the PR description or adding a skip-changelog label.

Generated by 🚫 dangerJS against 242ca41

@github-actions
Copy link
Contributor

🚨 Detected changes in high risk code 🚨

High-risk code has higher potential to break the SDK and may be hard to test. To prevent severe bugs, apply the rollout process for releasing such changes and be extra careful when changing and reviewing these files:

  • sentry/src/main/java/io/sentry/transport/RateLimiter.java

@github-actions
Copy link
Contributor

github-actions bot commented Aug 29, 2025

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 369.54 ms 423.76 ms 54.22 ms
Size 1.58 MiB 2.09 MiB 521.70 KiB

Previous results on branch: feat/continuous-profiling-03

Startup times

Revision Plain With Sentry Diff
42206f4 419.73 ms 494.82 ms 75.09 ms
d6f8356 392.08 ms 454.42 ms 62.34 ms
066d89d 377.51 ms 434.20 ms 56.69 ms
7dcbbbd 459.22 ms 509.85 ms 50.62 ms
f2cd43a 415.48 ms 485.28 ms 69.80 ms

App size

Revision Plain With Sentry Diff
42206f4 1.58 MiB 2.09 MiB 521.84 KiB
d6f8356 1.58 MiB 2.09 MiB 521.68 KiB
066d89d 1.58 MiB 2.09 MiB 521.85 KiB
7dcbbbd 1.58 MiB 2.09 MiB 521.68 KiB
f2cd43a 1.58 MiB 2.09 MiB 521.84 KiB

@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2025

🚨 Detected changes in high risk code 🚨

High-risk code has higher potential to break the SDK and may be hard to test. To prevent severe bugs, apply the rollout process for releasing such changes and be extra careful when changing and reviewing these files:

  • sentry/src/main/java/io/sentry/transport/RateLimiter.java

@lbloder lbloder mentioned this pull request Sep 8, 2025
9 tasks
@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2025

🚨 Detected changes in high risk code 🚨

High-risk code has higher potential to break the SDK and may be hard to test. To prevent severe bugs, apply the rollout process for releasing such changes and be extra careful when changing and reviewing these files:

  • sentry/src/main/java/io/sentry/transport/RateLimiter.java

@lbloder lbloder changed the title add readme and info about commit of the source repository Profiling - Deduplication and cleanup Sep 8, 2025
@lbloder lbloder mentioned this pull request Sep 8, 2025
10 tasks
@lbloder
Copy link
Collaborator Author

lbloder commented Sep 8, 2025

@sentry review

@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2025

🚨 Detected changes in high risk code 🚨

High-risk code has higher potential to break the SDK and may be hard to test. To prevent severe bugs, apply the rollout process for releasing such changes and be extra careful when changing and reviewing these files:

  • sentry/src/main/java/io/sentry/transport/RateLimiter.java

Copy link
Member

@adinauer adinauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP

}

private long resolveThreadId(int eventThreadId) {
return jfr.threads.get(eventThreadId) != null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l could extract a var here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m what does it mean if it can't be found in javaThreads? Does it mean it's simply not remapped from the original ID? Could we be reporting garbage data by using the original ID directly if remapping data is missing? e.g. eventThreadId = 9, we can't find it in javaThreads but the real thread ID was actually 117

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, potentially, if the orig. threadId somehow matches a different javaThreadId then we might map it to the wrong thread.
We could set it to a default thread id, like -1 if we can't resolve it. Or drop the event. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending negative ids seems to work and Sentry Backend correctly matches them if both the Thread in the profile and in the Transaction have negative ids.

So both options seem to be valid. However, I'm not sure if it helps to have a "Fallback" Thread that basically gets all unmapped events, as that graph would not make any sense.

I think discarding the event makes more sense, WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set threadId to -1 if it cannot be found in the map in follow-up PR: #4746


@Override
public void visit(Event event, long value) {
public void visit(Event event, long samples, long value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h any exception here, e.g. when processing a single stack frame would cause the whole chunk to be thrown away if I understand correctly. Is that what we want? Can we only drop broken frames instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How detailed do you think we should do that? just drop the whole event? i.e. catch any exception inside the visit method and just keep going?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opted to drop the event if an error occurs


File profileDir = new File(profilingTracesDirPath);

if (!profileDir.canWrite() || !profileDir.exists()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l can we split this up to provide a message that more easily helps pinpoint the problem?

private @NotNull String filename = "";

private final @NotNull AsyncProfiler profiler;
private @Nullable AsyncProfiler profiler;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l an alternative to making this nullable could have been to handle the exception in AsyncProfilerContinuousProfilerProvider and simply return a NoOpContinuousProfiler. This would keep the code here simpler.


@SuppressWarnings("ReferenceEquality")
@Override
public void startProfiler(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m should we catch here just in case something goes wrong?

Copy link
Collaborator Author

@lbloder lbloder Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where we would run into an exception that is not caught, but having a top-level safety net sounds like a good idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added exception handling in follow-up: #4746

@@ -0,0 +1,4 @@
# Vendored AsyncProfiler code for converting JFR Files
- Vendored-in from commit fe1bc66d4b6181413847f6bbe5c0db805f3e9194 of repository: [email protected]:async-profiler/async-profiler.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Vendored-in from commit fe1bc66d4b6181413847f6bbe5c0db805f3e9194 of repository: git@github.com:async-profiler/async-profiler.git
- Vendored-in from commit https://github.com/async-profiler/async-profiler/tree/fe1bc66d4b6181413847f6bbe5c0db805f3e9194

// No need to check exact decimal places as this depends on JFR precision
assertTrue(
timestamp < System.currentTimeMillis() / 1000.0 + 360,
"Timestamp should be recent",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here. It can't be recent if we use a pre-existing JFR file.
Is the message wrong here?
Why do we need to add 360?
Is this intended for creating the file within the test and asserting it's at most 10s old? Then this test might fail on slow CI.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, this test doesn't make much sense. I'm reworking it

@github-actions
Copy link
Contributor

🚨 Detected changes in high risk code 🚨

High-risk code has higher potential to break the SDK and may be hard to test. To prevent severe bugs, apply the rollout process for releasing such changes and be extra careful when changing and reviewing these files:

  • sentry/src/main/java/io/sentry/transport/RateLimiter.java

@github-actions
Copy link
Contributor

🚨 Detected changes in high risk code 🚨

High-risk code has higher potential to break the SDK and may be hard to test. To prevent severe bugs, apply the rollout process for releasing such changes and be extra careful when changing and reviewing these files:

  • sentry/src/main/java/io/sentry/transport/RateLimiter.java

@github-actions
Copy link
Contributor

🚨 Detected changes in high risk code 🚨

High-risk code has higher potential to break the SDK and may be hard to test. To prevent severe bugs, apply the rollout process for releasing such changes and be extra careful when changing and reviewing these files:

  • sentry/src/main/java/io/sentry/transport/RateLimiter.java

@lbloder lbloder marked this pull request as ready for review September 26, 2025 06:59
cursor[bot]

This comment was marked as outdated.

* add skipProfiling flag to TransactionOptions to be able to skip profiling and handle cases where profiling has been started by otel

* add profilerId to spanContext so that otel span processor can propagate this to the exporter and SentryTracer

* immediately end profiling when stopProfiler is called

* bump api, fix android api 24 code

* catch all exception happening when converting from jfr

* simplify JavaContinuous profiler by catching AsyncProfiler instantiation exceptions in provider

* add exists and writable info to log message

* add method to safely delete file

* remove setNative call

* fix test

* fix reference to commit we vendored from

* drop event if it cannot be processed to not lose the whole chunk

* Format code

* fix test

* Format code

* fix test

* catch exceptions in startProfiler/stopProfiler

* fallback to threadId -1 if it cannot be resolved

---------

Co-authored-by: Sentry Github Bot <[email protected]>
@github-actions
Copy link
Contributor

🚨 Detected changes in high risk code 🚨

High-risk code has higher potential to break the SDK and may be hard to test. To prevent severe bugs, apply the rollout process for releasing such changes and be extra careful when changing and reviewing these files:

  • sentry/src/main/java/io/sentry/transport/RateLimiter.java

@lbloder lbloder merged commit e43f1a2 into feat/continuous-profiling-02 Sep 26, 2025
31 of 34 checks passed
@lbloder lbloder deleted the feat/continuous-profiling-03 branch September 26, 2025 09:31
lbloder added a commit that referenced this pull request Sep 26, 2025
* delete unused JfrFrame and JfrToSentryProfileconverter

* use passed-in profilingTracesHz parameter instead of hardcoded value

* start profiler before starting the transaction when ProfileLifecycle.TRACE is used to have the profile ID when SentryTracer is created

* use improved way to calculate timestamp of sample

* api dump

* let profile-lifecycle be set from external_options, add tests for SpringBoot autoconfig

* initialize stackTraceFactory only once per chunk

* rename profile data classes, add deserialization and tests

* extract methods in ProfileConverter, fix SentryProfile serialization and make fields private

* use wall=[interval] instead of setting the event to wall and setting the interval separately, this seems to work better and create more samples

* start/stop profiler in OtelSentrySpanProcesser in trace mode for root spans

* add profiler dependency to jakarta-opentelemetry sample, add needed configs

* add dependenies and config to spring-boot-jakarta sample

* remove connection status check

* extract event visitor

* Add enum for ProfileChunk platform

* fallback to default temp directory for profiling on jvm if directory is not configured

* cleanup some minor things

* remove ProfilingInitializer, fix comments

* Format code

* add getter/setter to sample and metadata

* fix compile error

* add comment/todo for deleteOnExit

* Profiling - Deduplication and cleanup (#4681)

* add readme and info about commit of the source repository

* delete jfr file on jvm exit

* further split into smaller methods

* deduplicate frames in order to save bandwidth, add converter tests

* remove Platform Enum, use string constants instead for compatibility with cross platform frameworks

* implement equals and hashcode for SentryStackFrame to make frame deduplication work

* bump api

* improve error handling, fix start stop start flow

* add new testfile

* calculate ticksPerNanosecond in constructor

* adapt Ratelimiter to check for both ProfileChunk and ProfileChunkUi ratelimiting

* update ratelimiter test to check for both profileChunk and profileChunkUi drops

* use string constant instead of string

* Format code

* add non aggregating event collector to send each event individually, deduplicate stacks

* adapt converter tests to new non-aggregated converter

* Format code

* add logging to loadProfileConverter

* Format code

* fix duplication of events

* catch all exception happening when converting from jfr

* add exists and writable info to log message

* add method to safely delete file

* remove setNative call

* fix test

* fix reference to commit we vendored from

* drop event if it cannot be processed to not lose the whole chunk

* make format

* fix test

* Format code

* Profiling - OTEL profiling fix, Stabilization, Logging (#4746)

* add skipProfiling flag to TransactionOptions to be able to skip profiling and handle cases where profiling has been started by otel

* add profilerId to spanContext so that otel span processor can propagate this to the exporter and SentryTracer

* immediately end profiling when stopProfiler is called

* bump api, fix android api 24 code

* catch all exception happening when converting from jfr

* simplify JavaContinuous profiler by catching AsyncProfiler instantiation exceptions in provider

* add exists and writable info to log message

* add method to safely delete file

* remove setNative call

* fix test

* fix reference to commit we vendored from

* drop event if it cannot be processed to not lose the whole chunk

* Format code

* fix test

* Format code

* fix test

* catch exceptions in startProfiler/stopProfiler

* fallback to threadId -1 if it cannot be resolved

---------

Co-authored-by: Sentry Github Bot <[email protected]>

---------

Co-authored-by: Sentry Github Bot <[email protected]>

---------

Co-authored-by: Sentry Github Bot <[email protected]>
adinauer pushed a commit that referenced this pull request Oct 1, 2025
* wip

* re-add data category profile_chunk, fix json naming, new converter based on jfr converter bundled with asyncprofiler

* read java thread ids from jfr and use those instead of os thread ids, use existing SentryStackFrame instead of JfrFrame,

* adhere to sentry conventions re format null safety etc, fix compilation

* add profile-session-sample-rate to external options

* add platform as constructor param to ProfileChunk, wip: set java continuous profiler in sentry init

* add doubleToBigDecimal in JfrSample and ProfileChunk, same as we do it in SentrySpan to work around scientific notation of double, use wall clock profiling

* rename JfrProfile to SentryProfile

* move java profiling into its own module, load using SPI

* [WIP] use getProfilingTracesDirPath

* add missing build.gradle.kts for profiler module

* WIP continuous profiling in trace mode

* cleanup unused classes from vendor, cleanup packages

* allow setting profiling-traces-dir-path independently from cache dir using external options

* use profileChunk.platform to decide how to deal with the chunk instead of file extension

* port relevant AndroidContinuousProfilerTest tests to JavaContinuousProfilerTest

* add service loader tests for profiler and profile converter

* remove old jfr test files

* Support ProfileLifecycle.TRACE (#4576)

* delete unused JfrFrame and JfrToSentryProfileconverter

* use passed-in profilingTracesHz parameter instead of hardcoded value

* start profiler before starting the transaction when ProfileLifecycle.TRACE is used to have the profile ID when SentryTracer is created

* use improved way to calculate timestamp of sample

* api dump

* let profile-lifecycle be set from external_options, add tests for SpringBoot autoconfig

* initialize stackTraceFactory only once per chunk

* rename profile data classes, add deserialization and tests

* extract methods in ProfileConverter, fix SentryProfile serialization and make fields private

* use wall=[interval] instead of setting the event to wall and setting the interval separately, this seems to work better and create more samples

* start/stop profiler in OtelSentrySpanProcesser in trace mode for root spans

* add profiler dependency to jakarta-opentelemetry sample, add needed configs

* add dependenies and config to spring-boot-jakarta sample

* remove connection status check

* extract event visitor

* Add enum for ProfileChunk platform

* fallback to default temp directory for profiling on jvm if directory is not configured

* cleanup some minor things

* remove ProfilingInitializer, fix comments

* Format code

* add getter/setter to sample and metadata

* fix compile error

* add comment/todo for deleteOnExit

* Profiling - Deduplication and cleanup (#4681)

* add readme and info about commit of the source repository

* delete jfr file on jvm exit

* further split into smaller methods

* deduplicate frames in order to save bandwidth, add converter tests

* remove Platform Enum, use string constants instead for compatibility with cross platform frameworks

* implement equals and hashcode for SentryStackFrame to make frame deduplication work

* bump api

* improve error handling, fix start stop start flow

* add new testfile

* calculate ticksPerNanosecond in constructor

* adapt Ratelimiter to check for both ProfileChunk and ProfileChunkUi ratelimiting

* update ratelimiter test to check for both profileChunk and profileChunkUi drops

* use string constant instead of string

* Format code

* add non aggregating event collector to send each event individually, deduplicate stacks

* adapt converter tests to new non-aggregated converter

* Format code

* add logging to loadProfileConverter

* Format code

* fix duplication of events

* catch all exception happening when converting from jfr

* add exists and writable info to log message

* add method to safely delete file

* remove setNative call

* fix test

* fix reference to commit we vendored from

* drop event if it cannot be processed to not lose the whole chunk

* make format

* fix test

* Format code

* Profiling - OTEL profiling fix, Stabilization, Logging (#4746)

* add skipProfiling flag to TransactionOptions to be able to skip profiling and handle cases where profiling has been started by otel

* add profilerId to spanContext so that otel span processor can propagate this to the exporter and SentryTracer

* immediately end profiling when stopProfiler is called

* bump api, fix android api 24 code

* catch all exception happening when converting from jfr

* simplify JavaContinuous profiler by catching AsyncProfiler instantiation exceptions in provider

* add exists and writable info to log message

* add method to safely delete file

* remove setNative call

* fix test

* fix reference to commit we vendored from

* drop event if it cannot be processed to not lose the whole chunk

* Format code

* fix test

* Format code

* fix test

* catch exceptions in startProfiler/stopProfiler

* fallback to threadId -1 if it cannot be resolved

---------

Co-authored-by: Sentry Github Bot <[email protected]>

---------

Co-authored-by: Sentry Github Bot <[email protected]>

---------

Co-authored-by: Sentry Github Bot <[email protected]>

* change profiler api from Path to String to keep Android Api 21 compatibility

* mark all profiling related classes as internal

* Format code

* mark Sentry profile classes internal

* Format code

* mark SentryThreadMetadata internal

* add changelog entry

* Format code

* re-add asyncprofiler classes, cr changes

* bump api

* detect dangerous/invalid chars in profiling directory name

* ignore vendored profile conversion code in codecov

---------

Co-authored-by: Sentry Github Bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants