-
Notifications
You must be signed in to change notification settings - Fork 95
No more ThreadLocals, delegate everything to 'palantir/tracing-java' #799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| @Override | ||
| public void doConsume(Span span) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was an implementation detail of the abstract AsyncSpanObserver class (which we no longer extend), so I think it's ok to delete.
9cad433 to
b009613
Compare
| @JsonSerialize(as = ImmutableZipkinCompatSpan.class) | ||
| @Value.Immutable | ||
| @Value.Style(visibility = Value.Style.ImplementationVisibility.PACKAGE) | ||
| abstract static class ZipkinCompatSpan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These classes have been completely deleted because they're an implementation detail that now exists in the new tracing-java class: com.palantir.tracing.AsyncSlf4jSpanObserver
| private Tracer() {} | ||
|
|
||
| // Thread-safe since thread-local | ||
| private static final ThreadLocal<Trace> currentTrace = ThreadLocal.withInitial(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the critical line! remoting3 no longer maintains any ThreadLocal state, instead it's all delegated to the shiny new tracing-java.
b009613 to
bb252e2
Compare
| currentTrace.remove(); | ||
| MDC.remove(Tracers.TRACE_ID_KEY); | ||
| return trace; | ||
| com.palantir.tracing.Trace trace = com.palantir.tracing.Tracer.getAndClearTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iamdanfox, is this possible that we do something like this instead of having the ExposedTrace that lives inside the tracing-java package?
- com.palantir.tracing.Trace trace = com.palantir.tracing.Tracer.getAndClearTrace();
- return new Trace(ExposedTrace.isObservable(trace), ExposedTrace.getTraceId(trace));
+ boolean isObservable = com.palantir.tracing.Tracer.isTraceObservable();
+ String traceId = com.palantir.tracing.Tracer.getTraceId();
+ com.palantir.tracing.Tracer.getAndClearTrace();
+ return new Trace(isObservable, traceId);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a bit of a race condition, where we might get the isTraceObservable info, then someone changes stuff beneath our feet, then we get the traceId, then finally we clear it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that is not gonna work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In every case in this impl we should be delegating all state :)
f403597 to
badb51c
Compare
| */ | ||
| public static CloseableTracer startSpan(String operation) { | ||
| Tracer.startSpan(operation); | ||
| com.palantir.tracing.Tracer.startSpan(operation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just be like com.palantir.tracing.CloseableTracer.startSpan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i kind of prefer using com.palantir.tracing.Tracer.startSpan(operation); to match up against com.palantir.tracing.Tracer.fastCompleteSpan(); few lines below since we can't get the instance of the com.palantir.tracing.CloseableTracer.
| /** Utility methods for making {@link ExecutorService} and {@link Runnable} instances tracing-aware. */ | ||
| public final class Tracers { | ||
| /** The key under which trace ids are inserted into SLF4J {@link org.slf4j.MDC MDCs}. */ | ||
| public static final String TRACE_ID_KEY = "traceId"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should delete public static final fields like this, because if some consumer is depending on com.palantir.remoting3.tracingTracers.TRACE_ID_KEY then they'd suddenly break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't seem like anyone is using it, https://github.com/search?p=2&q=Tracers.TRACE_ID_KEY&type=Code
i also did a quick search on the key internally and no internal consumer is using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think we should remove things like this when there isn't a clear motivation - what if there was a user that didn't appear in your search? Did you check OSS consumers too (e.g. atlasdb?)
| * not thread-safe and is intended to be used in a thread-local context. | ||
| */ | ||
| final class Trace { | ||
| public final class Trace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to become public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is also kind of odd that we have a public static method that returns Trace and Trace is private.
https://github.com/palantir/http-remoting/pull/799/files#diff-7ee517b85058e51de91b9bf446ff7585L246
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qinfchen I don't have the original context, but one possible justification could be that this Trace class is useful for testing but shouldn't be used by consumers??
This reverts commit 1bd0d56.
1968b2f to
adf0f74
Compare
* Expose HostMetricsRegistry record methods (#780) This is so that we can separately implement a HostMetricsSink for it (see #779) such that we can share host metrics when constructing both conjure and remoting3 clients * publish BOM for jaxrs-client (#789) * Excavator: Render CircleCI file using template specified in .circleci/template.sh (#791) * Upgrade OkHttp to 3.11.0. (#794) * AssertionErrors are converted into service exceptions with type internal (#727) * No more ThreadLocals, delegate everything to 'palantir/tracing-java' (#799) * Use BINTRAY_*_REMOTING envs (#802) The project's default bintray creds are currently set up to publish to `conjure-java-runtime`. Use these custom env vars to maintain ability to publish http-remoting. * Better behaviour in the presence of 429s (#786)

Before this PR
People might start adopting our shiny new palantir/tracing-java libraries, while keeping remoting3 ones on the classpath. This would be very bad because there would be two ThreadLocals trying to keep track of the current trace:
Concretely, all the clients in Atlas would use the remoting3 tracers but newer user-code would likely use tracing-java. This would make tracing information far less useful for debugging.
After this PR
com.palantir.remoting3.tracing.Tracerno longer maintains its own ThreadLocal, instead it delegates everything tocom.palantir.tracing.Tracer.Caveats
Tracer#subscribe(String, SpanObserver)behaves differently now (it used to preserve reference equality and it no longer does).Tracer#getAndClearTracereturns an incomplete copy of the trace. I think this is OK because that method was never usable externally because the returned type was not public.When migrating tritium to the rebranded tracing, @j-baker and @schlosna pointed out that that tracers are threadlocal and could potentially break tracing if
remoting-tracersandconjure tracerswere not on the same thread.One proposal suggested by @iamdanfox was to delegate remotings tracing calls to conjure tracers. This pr is to demonstrate how the implementation would look like.
See more details here: palantir/tritium#100.