Skip to content

Conversation

@mpeyrotc
Copy link
Member

When a MemoryGraph has many addresses, the internal Dictionary instance it uses to keep track of them might run out of space due to its internal array's limitations. Hence, this PR adds a new SegmentedDictionary class that will not run into this issue. The original implementation of the class is located in the dotnet/roslyn repo and can be found here.

The new class should have a similar performance to the Dictionary class it is replacing. However, we will continue to use the old class unless we detect that we might run into the memory issue.

This PR adds also the Generic Dictionary tests from the dotnet/runtime repo to guarantee that the new implementation is behaving as expected.

@brianrob
Copy link
Member

Overall, looks good. A couple of requests:

  1. Can you move the new files into a new SegmentedDictionary subdirectory of FastSerialization? This will help to keep track of what files were copied.
  2. Can you move the tests from PerfView.Tests to TraceEvent.Tests? This is where the other FastSerialization tests live. They should also live in a SegmentedDictionary subdirectory as well.

@brianrob brianrob changed the base branch from main to feature/large-gcdump September 17, 2021 02:47
Copy link
Member

@brianrob brianrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions here, but I'll go ahead and merge this into the feature branch to unblock further progress.

}
}

public int Capacity => this.capacity;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to understand the need for these two new public APIs. Are these concerned internally by SegmentedDictionary?

m_addressToNodeIndex = new Dictionary<Address, NodeIndex>(expectedSize);
// If we have too many addresses we will reach the Dictionary's internal array's size limit and throw.
// Therefore use a new implementation of it that is similar in performance but that can handle the extra load.
if (expectedSize > 200_000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share where this number comes from? I realize that we have to pick something if we want to only impact "large" dumps, but just want to understand the math behind this.

@brianrob brianrob merged commit 938aef8 into microsoft:feature/large-gcdump Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants