SegmentedDictionary Initial Implementation and Tests #1488

mpeyrotc · 2021-09-14T22:43:36Z

When a MemoryGraph has many addresses, the internal Dictionary instance it uses to keep track of them might run out of space due to its internal array's limitations. Hence, this PR adds a new SegmentedDictionary class that will not run into this issue. The original implementation of the class is located in the dotnet/roslyn repo and can be found here.

The new class should have a similar performance to the Dictionary class it is replacing. However, we will continue to use the old class unless we detect that we might run into the memory issue.

This PR adds also the Generic Dictionary tests from the dotnet/runtime repo to guarantee that the new implementation is behaving as expected.

brianrob · 2021-09-15T14:41:40Z

Overall, looks good. A couple of requests:

Can you move the new files into a new SegmentedDictionary subdirectory of FastSerialization? This will help to keep track of what files were copied.
Can you move the tests from PerfView.Tests to TraceEvent.Tests? This is where the other FastSerialization tests live. They should also live in a SegmentedDictionary subdirectory as well.

brianrob

A couple of questions here, but I'll go ahead and merge this into the feature branch to unblock further progress.

brianrob · 2021-09-22T00:45:34Z

src/FastSerialization/SegmentedList.cs

            }
        }

+        public int Capacity => this.capacity;


Would like to understand the need for these two new public APIs. Are these concerned internally by SegmentedDictionary?

brianrob · 2021-09-22T00:46:19Z

src/MemoryGraph/MemoryGraph.cs

-            m_addressToNodeIndex = new Dictionary<Address, NodeIndex>(expectedSize);
+            // If we have too many addresses we will reach the Dictionary's internal array's size limit and throw.
+            // Therefore use a new implementation of it that is similar in performance but that can handle the extra load.
+            if (expectedSize > 200_000)


Can you share where this number comes from? I realize that we have to pick something if we want to only impact "large" dumps, but just want to understand the math behind this.

SegmentedDictionary Impl and Tests passing

8d373c3

Address Comments

e696608

brianrob changed the base branch from main to feature/large-gcdump September 17, 2021 02:47

brianrob approved these changes Sep 22, 2021

View reviewed changes

brianrob merged commit 938aef8 into microsoft:feature/large-gcdump Sep 22, 2021

mpeyrotc mentioned this pull request Sep 23, 2021

Use new SegmentedList and SegmentedDictionary to process large heap dumps #1498

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SegmentedDictionary Initial Implementation and Tests #1488

SegmentedDictionary Initial Implementation and Tests #1488

Uh oh!

mpeyrotc commented Sep 14, 2021

Uh oh!

brianrob commented Sep 15, 2021

Uh oh!

brianrob left a comment

Uh oh!

brianrob Sep 22, 2021

Uh oh!

brianrob Sep 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SegmentedDictionary Initial Implementation and Tests #1488

SegmentedDictionary Initial Implementation and Tests #1488

Uh oh!

Conversation

mpeyrotc commented Sep 14, 2021

Uh oh!

brianrob commented Sep 15, 2021

Uh oh!

brianrob left a comment

Choose a reason for hiding this comment

Uh oh!

brianrob Sep 22, 2021

Choose a reason for hiding this comment

Uh oh!

brianrob Sep 22, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants