Overhaul System.Net library logging #13134

stephentoub · 2016-10-28T19:57:15Z

There are a few problems with the existing solution:

In the original sources, logging was split across a GlobalLog class and a Logging class. Global-related tracing went to GlobalLog, and library-specific tracing went to Logging. When the sources were moved to corefx, this evolved into multiple EventSource-based types, most of which were in Common and were then src-included into each System.Net assembly. This has a significant problem, though, which is that the name/Guid identity of an EventSource can only exist once in a process, and subsequent EventSource instances with the same identity fail. This means that whichever assembly ends up logging something first wins, and all other tracing from other System.Net assemblies ends up being lost.
The split also may have made sense when all components shared the same GlobalLog, but now that GlobalLog is included as a unique EventSource instance in each assembly, it’s no longer global. This means that we’re not only loading multiple EventSources unnecessarily (each has its own costs), but we’re also duplicating logging, as some logging is done to one or the other, but some logging is done to both.
Due to sharing the Logging class, which evolved into an EventSource-based type, many logging calls had to include which component they came from as an enum value, leading to bloating at the call sites.
Additionally, many logging operations included the name of the method in which the calling was being done, which meant both longer call sites as well as stale values as members got renamed over time.
Call sites ended up getting fairly long due to lots of string concatenation operations and other manipulations in the logging call. This also then led to most call sites needing to be guarded behind checks for IsEnabled, leading to lots of code at every call site and obscuring the actual information being traced.
An additional requirement that got added with the move to corefx was that certain failure conditions should both Debug.Fail and do tracing, leading to some tracing operations taking upwards of 10 lines of code.

All of these issues are addressed by this PR.

Each System.Net assembly that needs to log now loads a single NetEventSource type for doing logging, each with its own identity.
That implementation is split across a common, partial NetEventSource.Common.cs file used by all such assemblies, and then a per-assembly partial NetEventSource.AssemblyName.cs file used by the individual assembly. The former file contains most of the logging implementation. The latter files typically contain just the partial class declaration along with the [EventSource] attribute that gives that assembly’s NetEventSource its unique identity / name.
The logging operations in NetEventSource.Common.cs use a variety of relatively-recent language enhancements to improve the logging. Each operation uses a [CallerMemberName] so that the call sites don’t need to specify the caller. FormattableString is used so that call sites can use string interpolation while still providing the logging implementation the flexibility to format the individual components in a desired fashion (e.g. IntPtrs are formatted as hex, SafeHandles are rendered in a way that includes their name+hashcode+handle, etc.), along with a mechanism that allows individual libraries to add additional formatting rules. Each operation allows the this reference (or another this-like context object) to be passed in, to simply identify the source of the call (in concert with the caller member name). A debug-only mechanism is included in the tracing to help find potential perf issues due to misuse of the logger. Etc.
A Fail logging operation is included that both traces and does a Debug.Fail.

With these changes, most call sites are greatly simplified. Some examples:

Before:

if (GlobalLog.IsEnabled)
{
    GlobalLog.AssertFormat("ContextAwareResult#{0}::ContextCopy|Called on completed result.", LoggingHash.HashString(this));
}
Debug.Fail("ContextAwareResult#" + LoggingHash.HashString(this) + "::ContextCopy |Called on completed result.");

After:

NetEventSource.Fail(this, "Called on completed result.");

Before:

if (NetEventSource.Log.IsEnabled()) NetEventSource.Enter(NetEventSource.ComponentType.Http, this, ".ctor", handler);

After:

NetEventSource.Enter(this, handler);

Etc.

In addition to fixing up all of the event sources, I of course also fixed up all usage in order to clean up the call sites. However, this was a rather mechanical change: I did not remove existing logging (other than duplication) nor did I add new logging. At some point, we should review the actual logging being done to determine what’s missing and what’s not needed. I also did not revise the additional existing events on the sockets and security event sources, but we may want to remove those (or some of those) in favor of just using the shared events.

Fixes https://github.com/dotnet/corefx/issues/5265
Fixes https://github.com/dotnet/corefx/issues/5755
Fixes https://github.com/dotnet/corefx/issues/12685
Fixes https://github.com/dotnet/corefx/issues/12808

cc: @CIPop, @davidsh, @ericeil, @geoffkizer, @Priya91, @Petermarcu

geoffkizer · 2016-10-28T21:23:10Z

src/Common/src/Interop/Windows/sspicli/NegotiationInfoClass.cs

-                {
-                    GlobalLog.Print("NegotiationInfoClass::.ctor() packageInfo:" + packageInfo.ToString("x8") + " negotiationState:" + negotiationState.ToString("x8") + " name:" + LoggingHash.ObjectToString(name));
-                }
+                if (NetEventSource.IsEnabled) NetEventSource.Info(this, $"packageInfo:{packageInfo} negotiationState:{negotiationState} name:{name}");


You use :X to format negotiationState above, is that needed here?

Yes, I'll fix that... I missed it when rewriting the line.

geoffkizer · 2016-10-28T21:37:04Z

src/Common/src/System/Net/Logging/NetEventSource.Common.cs

+        /// <param name="buffer">The buffer to be logged.</param>
+        /// <param name="memberName">The calling member.</param>
+        [NonEvent]
+        public static void DumpArray(object thisOrContextObject, byte[] buffer, [CallerMemberName] string memberName = null)


Since this is specifically for dumping a buffer, can we call it "DumpBuffer"?

geoffkizer · 2016-10-28T21:39:27Z

src/Common/src/System/Net/Logging/NetEventSource.Common.cs

+        {
+            DebugValidateArg(first);
+            DebugValidateArg(second);
+            if (IsEnabled) Log.Associate(IdOf(first), memberName, IdOf(first), IdOf(second));


Is this correct? Seems like if this is the desired behavior, callers could use the overload that takes an explicit thisOrContextObject argument.

It's correct, or at least it's the behavior I intended. The most common use case was just calling Associate(this, somethingElse), and I wanted to avoid the caller having to do Associate(this, this, somethingElse), so I added this overload that it does it for them.

geoffkizer · 2016-10-28T21:46:05Z

src/Common/src/Interop/Windows/sspicli/SSPIWrapper.cs

-            {
-                SecurityEventSource.Log.SecurityContextInputBuffers(nameof(InitializeSecurityContext), (inputBuffers == null ? 0 : inputBuffers.Length), outputBuffer.size, (Interop.SECURITY_STATUS)errorCode);
-            }
+            NetEventSource.Log.SecurityContextInputBuffers(nameof(InitializeSecurityContext), inputBuffers?.Length ?? 0, outputBuffer.size, (Interop.SECURITY_STATUS)errorCode);


This is minor, but shouldn't there be an IsEnabled test here? (Similarly above)

There's very little work happening in the call site; didn't seem necessary to put it behind a guard. But I can if you'd like.

geoffkizer · 2016-10-28T21:47:40Z

src/Common/src/Interop/Windows/sspicli/SSPIWrapper.cs

                    }

-                    if (errorCode != 0 && NetEventSource.Log.IsEnabled())
+                    if (errorCode != 0 && NetEventSource.IsEnabled)


The IsEnabled test should come first

I suppose it doesn't really matter.

I guess I'd like to adhere to a principle that says: Any code that's only needed for tracing should be inside an IsEnabled check, implicitly or explicitly. And here, the errorCode != 0 test is only needed for tracing. I realize that's being rather nitty.

I've not measured, but I'd expect swapping the order to actually be worse for perf, which is why I left it this way. If the IsEnabled check comes first, even in the success case we have to access the Log field and then access its enabled field. That said, I doubt either order would be at all measurable. I can swap it if it bothers you.

geoffkizer · 2016-10-28T21:52:13Z

src/System.Net.Security/src/System/Net/Security/SecureChannel.cs

                    _maxDataSize = checked(streamSizes.MaximumMessage - (_headerSize + _trailerSize));
                }
-                catch (Exception e)
+                catch (Exception e) when (!ExceptionCheck.IsFatal(e))


This seems like it could potentially have a perf impact in release, since we've added a try/catch. Not sure what to do about it.

since we've added a try/catch

I didn't add a try/catch. Are you saying we should think about removing the try/catch? We could, but I one of my goals for this pass was to not remove any tracing... the only tracing I removed was because it was obviously duplicative (or in a few cases where it was in a TRACE_VERBOSE region, had gotten stale, and referred to things that no longer existed).

You're right, the try/catch was already there.

What I should say is, we should consider removing the try/catch since it's only needed for tracing. That's beyond the scope of this change though.

geoffkizer · 2016-10-28T22:03:12Z

src/Common/src/Interop/Windows/sspicli/SecuritySafeHandles.cs

            SecurityBuffer outSecBuffer,
            ref Interop.SspiCli.ContextFlags outFlags)
        {
 #if TRACE_VERBOSE


Is there any reason to keep TRACE_VERBOSE now, assuming it's all protected by an IsEnabled test?

I don't know. I didn't want to remove any of these. As I noted in the PR description, there is likely some tracing being done that's no longer necessary, and there's probably some missing tracing. My goal here wasn't to address that, but rather to get tracing into a working form. Subsequently someone can review all of what's actually being trace holistically.

geoffkizer · 2016-10-28T22:06:21Z

src/Common/src/Interop/Windows/sspicli/SecuritySafeHandles.cs

-#if TRACE_VERBOSE
-                GlobalLog.Print("    inSecBuffers[]   = length:" + inSecBuffers.Length);
-#endif
+                NetEventSource.Enter(null, "SafeDeleteContext::CompleteAuthToken");


This would be better handled in a single Enter event instead of Enter plus two Infos. (That may be beyond the scope of this change, but can we at least file general issues for obviously bad stuff like this?)

That may be beyond the scope of this change

Right. See my previous comments.

geoffkizer · 2016-10-28T22:17:29Z

src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.cs

-                        if (s_loggingEnabled)
-                            SocketsEventSource.Accepted(_acceptSocket, _acceptSocket.RemoteEndPoint, _acceptSocket.LocalEndPoint);
+                        NetEventSource.Accepted(_acceptSocket, _acceptSocket.RemoteEndPoint, _acceptSocket.LocalEndPoint);
                    }


Should there be an IsEnabled check here?

It wouldn't buy much. I tried to only use IsEnabled guards where there was real work happening in the call site.

geoffkizer · 2016-10-28T22:18:14Z

src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.cs

                    {
-                        if (s_loggingEnabled)
-                            SocketsEventSource.Connected(_currentSocket, _currentSocket.LocalEndPoint, _currentSocket.RemoteEndPoint);
+                        NetEventSource.Connected(_currentSocket, _currentSocket.LocalEndPoint, _currentSocket.RemoteEndPoint);


IsEnabled check?

Ditto. Is there something about these particular calls that suggests to you IsEnabled would make a measurable difference?

No, not really. As above, it's more of a "what's the general principle?" sort of issue. I'll try to elaborate better in a subsequent comment.

what's the general principle?

For me the general principle was: is there non-trivial work happening at the call site... if yes, it must be guarded, if no, it doesn't need to be.

We could say that every single call site to any tracing method in NetEventSource needs to have an IsEnabled check before it. It'll lead to clunkier code, but it'll be easy to explain. Is that what you'd prefer?

We could say that every single call site to any tracing method in NetEventSource needs to have an IsEnabled check before it. It'll lead to clunkier code, but it'll be easy to explain. Is that what you'd prefer?

I think I'm leaning that way, yeah.

I'm used to using macros to do this in native code, which is pretty easy to get right, and not very crufty. Unfortunately this isn't possible in managed code.

The problem is the "non-trivial work happening" part. This seems fairly easy to get wrong. The "no value types" rule in particular made me pause a few times when reviewing the code, since you need to think about the types of the arguments you're passing.

I realize that the value types mistake will get caught at your checks, but it seems like it would be better to have the code be correct at a glance rather than having to think about it.

And other potential issues, like "NetEventSource.Info(this, SomeExpensiveFunction())" can't be caught by your checks.

I realize that that makes the code cruftier, and I don't like that, but overall I think I prefer the more verbose code.

One additional thought:

There's little potential confusion in trivial cases like Enter(this) or Error(this, e). So perhaps a middle path would be to keep the convenience methods for these cases and not require wrapping them in if(enabled) tests, while requiring anything more involved than this to do an explicit if(enabled) test.

For clarity, though, I think I'd name these convenience methods something other than just "Enter" and "Exit". Like"CheckEnabledAndEnter". That's a terrible name of course, but you get the idea.

Ok. While I was hoping to avoid all those checks, I get that having them makes it easier to always do the right thing and avoid mistakes. I'll add a commit that adds guards everywhere.

geoffkizer · 2016-10-28T22:20:57Z

src/System.Net.Sockets/src/System/Net/Sockets/UDPClient.cs

-                    GlobalLog.Print("UdpClient::Dispose()");
-                }
-
+                NetEventSource.Info(this);


Shouldn't there be some info text here?

There wasn't in the original, so I didn't add any. The original only traced out the name of the caller, which happens implicitly in this revised version.

Makes sense. I probably wouldn't want to do this in new code, but I understand that's beyond the scope here.

Priya91 · 2016-10-28T22:32:46Z

src/Common/src/Interop/Windows/SChannel/UnmanagedCertificateContext.cs

-                            "UnmanagedCertificateContext::GetRemoteCertificatesFromStoreContext " +
-                            "adding remote certificate:" + cert.Subject + cert.Thumbprint);
-                    }
+                    if (NetEventSource.IsEnabled) NetEventSource.Info(certContext, $"Adding remote certificate:{cert}");


IsEnabled is checked in NetEventSource.Info no? There at other places as well.

Yes, but the call site is using string interpolation, and will end up allocating a FormattableString along with an object[] to hold the arguments (cert). The IsEnabled check guards those costs.

geoffkizer · 2016-10-28T22:38:42Z

src/System.Net.Requests/src/System/Net/FtpWebRequest.cs

-
-                        Debug.Fail("Find out why we are getting an unexpected exception.");
-                    }
+                    NetEventSource.Error(this, exception);


It seems like this should be a Fail call.

This is one place where I didn't strictly follow what was there previously, because what was there previously was bogus. This is a valid exception path... in fact, that Ftp test that an issue got opened for around Dns resolution repros for me locally and causes the Debug.Fail that was here to fire. We should not have asserts fire for anything that's legitimately hittable, so I changed it to the Error that should have been.

Priya91 · 2016-10-28T22:38:54Z

src/System.Net.Mail/src/System/Net/Mail/SmtpClient.cs

-            {
-                MailEventSource.Log.Send(nameof(pickupDirectory), pickupDirectory);
-            }
+            if (NetEventSource.IsEnabled) NetEventSource.Info(this, $"pickupDirectornameof={pickupDirectory}");


$"{nameof(pickupDirectory)}={pickupDirectory}" ..

Thanks. Will fix.

Priya91 · 2016-10-28T22:48:56Z

Looked at the Mail changes, LGTM !

stephentoub · 2016-10-28T23:36:29Z

@dotnet-bot Test Innerloop Windows_NT Debug Build and Test please (#13025)

stephentoub · 2016-10-28T23:36:45Z

Looked at the Mail changes, LGTM !

Thanks, @Priya91 😄

geoffkizer · 2016-10-28T23:57:29Z

This stuff looks fantastic. Thanks much for tackling this!

davidsh

LGTM

Thank you for cleaning this up. I like this overall, clean, design.

This PR overhauls how logging is accomplished in the System.Net libraries. There are a few problems with the existing solution: 1. In the original sources, logging was split across a GlobalLog class and a Logging class. Global-related tracing went to GlobalLog, and library-specific tracing went to Logging. When the sources were moved to corefx, this evolved into multiple EventSource-based types, most of which were in Common and were then src-included into each System.Net assembly. This has a significant problem, though, which is that the name/Guid identity of an EventSource can only exist once in a process, and subsequent EventSource instances with the same identity fail. This means that whichever assembly ends up logging something first wins, and all other tracing from other System.Net assemblies ends up being lost. 2. The split also may have made sense when all components shared the same GlobalLog, but now that GlobalLog is included as a unique EventSource instance in each assembly, it's no longer global. This means that we're not only loading multiple EventSources unnecessarily (each has its own costs), but we're also duplicating logging, as some logging is done to one or the other, but some logging is done to both. 3. Due to sharing the Logging class, which evolved into an EventSource-based type, many logging calls had to include which component they came from as an enum value, leading to bloating at the call sites. 4. Additionally, many logging operations included the name of the method in which the calling was being done, which meant both longer call sites as well as stale values as members got renamed over time. 5. Call sites ended up getting fairly long due to lots of string concatenation operations and other manipulations in the logging call. This also then led to most call sites needing to be guarded behind checks for IsEnabled, leading to lots of code at every call site and obscuring the actual information being traced. 6. An additional requirement that got added with the move to corefx was that certain failure conditions should both Debug.Fail and do tracing, leading to some tracing operations taking upwards of 10 lines of code. All of these issues are addressed by this PR. - Each System.Net assembly that needs to log now loads a single NetEventSource type for doing logging, each with its own identity. - That implementation is split across a common, partial NetEventSource.Common.cs file used by all such assemblies, and then a per-assembly partial NetEventSource.AssemblyName.cs file used by the individual assembly. The former file contains most of the logging implementation. The latter files typically contain just the partial class declaration along with the [EventSource] attribute that gives that assembly's NetEventSource its unique identity / name. - The logging operations in NetEventSource.Common.cs use a variety of relatively-recent language enhancements to improve the logging. Each operation uses a [CallerMemberName] so that the call sites don't need to specify the caller. FormattableString is used so that call sites can use string interpolation while still providing the logging implementation the flexibility to format the individual components in a desired fashion (e.g. IntPtrs are formatted as hex, SafeHandles are rendered in a way that includes their name+hashcode+handle, etc.), along with a mechanism that allows individual libraries to add additional formatting rules. Each operation allows the this reference (or another this-like context object) to be passed in, to simply identify the source of the call (in concert with the caller member name). A debug-only mechanism is included in the tracing to help find potential perf issues due to misuse of the logger. Etc. - A Fail logging operation is included that both traces and does a Debug.Fail. With these changes, most call sites are greatly simplified. Some examples: Before: ```C# if (GlobalLog.IsEnabled) { GlobalLog.AssertFormat("ContextAwareResult#{0}::ContextCopy|Called on completed result.", LoggingHash.HashString(this)); } Debug.Fail("ContextAwareResult#" + LoggingHash.HashString(this) + "::ContextCopy |Called on completed result."); ``` After: ```C# NetEventSource.Fail(this, "Called on completed result."); ``` Before: ```C# if (NetEventSource.Log.IsEnabled()) NetEventSource.Enter(NetEventSource.ComponentType.Http, this, ".ctor", handler); ``` After: ```C# NetEventSource.Enter(this, handler); ``` Etc. In addition to fixing up all of the event sources, I of course also fixed up all usage in order to clean up the call sites. However, this was a rather mechanical change: I did not remove existing logging (other than duplication) nor did I add new logging. At some point, we should review the actual logging being done to determine what's missing and what's not needed. I also did not revise the additional existing events on the sockets and security event sources, but we may want to remove those (or some of those) in favor of just using the shared events.

- Fix a few copy-and-paste errors - Rename DumpArray to DumpBuffer - Prefix every NetEventSource usage with IsEnabled check (except for Fail, which represents a critical programming error)

- Add tests that actually exercise the event sources, making sure that no event source errors are produced and that events are actually generated. - Fix a few src issues found from these tests.

stephentoub · 2016-10-29T17:40:32Z

@dotnet-bot Test Outerloop Windows_NT Debug please
@dotnet-bot Test Outerloop Ubuntu14.04 Debug please

dnfclas added the cla-already-signed label Oct 28, 2016

stephentoub changed the title ~~This PR overhauls how logging is accomplished in the System.Net libraries~~ Overhaul System.Net library logging Oct 28, 2016

davidsh added the area-System.Net label Oct 28, 2016

davidsh added this to the 1.2.0 milestone Oct 28, 2016

stephentoub force-pushed the overhaul_net_logging branch from 6ed2c53 to 2e26f29 Compare October 28, 2016 20:35

geoffkizer reviewed Oct 28, 2016

View reviewed changes

Priya91 reviewed Oct 28, 2016

View reviewed changes

geoffkizer reviewed Oct 28, 2016

View reviewed changes

Priya91 reviewed Oct 28, 2016

View reviewed changes

davidsh approved these changes Oct 29, 2016

View reviewed changes

stephentoub force-pushed the overhaul_net_logging branch from 2e26f29 to 6e4a0a0 Compare October 29, 2016 03:41

stephentoub added 2 commits October 29, 2016 10:01

Address PR feedback

3b251eb

- Fix a few copy-and-paste errors - Rename DumpArray to DumpBuffer - Prefix every NetEventSource usage with IsEnabled check (except for Fail, which represents a critical programming error)

stephentoub force-pushed the overhaul_net_logging branch 2 times, most recently from 390af1e to e486c35 Compare October 29, 2016 15:28

Add additional System.Net tracing tests

2284f73

- Add tests that actually exercise the event sources, making sure that no event source errors are produced and that events are actually generated. - Fix a few src issues found from these tests.

stephentoub force-pushed the overhaul_net_logging branch from e486c35 to 2284f73 Compare October 29, 2016 15:54

Fix build for versions without Assembly.Location

3070c9d

karelz assigned stephentoub Oct 29, 2016

stephentoub merged commit 6b5ef12 into dotnet:master Oct 29, 2016

stephentoub deleted the overhaul_net_logging branch October 29, 2016 18:48

stephentoub mentioned this pull request Oct 31, 2016

Add tracing for System.Transactions #13185

Merged

Overhaul System.Net library logging #13134

Overhaul System.Net library logging #13134

Uh oh!

Conversation

stephentoub commented Oct 28, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geoffkizer Oct 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Priya91 Oct 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geoffkizer Oct 28, 2016 •

edited

Loading

Priya91 Oct 28, 2016 •

edited

Loading

Priya91 Oct 28, 2016 •

edited

Loading