Skip to content
This repository was archived by the owner on Nov 1, 2020. It is now read-only.

Conversation

@kouvel
Copy link
Contributor

@kouvel kouvel commented Feb 21, 2019

@kouvel
Copy link
Contributor Author

kouvel commented Feb 21, 2019

There is a slightly noticeable decrease in throughput of local work items queued to the thread pool in raw overhead per work item. It is very small and when a very short delay is added to the work item the difference is almost not measurable, so it should be acceptably small.

@filipnavara
Copy link
Member

Fixed Timer implementation on Unixes. Previously there was only ever one timer request from the upper-level implementation and that is not the case anymore, so the lower-level "app domain timer" implementation needed to handle multiple timer requests.

I am not sure I understand your motivation for the change. I believe the whole point was to split the queues and process them separately in separate threads. Now you are trying to undo that at lower level. Am I missing something?

@kouvel
Copy link
Contributor Author

kouvel commented Feb 22, 2019

I am not sure I understand your motivation for the change. I believe the whole point was to split the queues and process them separately in separate threads. Now you are trying to undo that at lower level. Am I missing something?

Only one thread was created and the TimerQueue ID was being ignored. There isn't much benefit to using multiple threads, the goal of the original change was to split the queues such that each queue gets a separate callback on the thread pool and each queue has its own lock so they can operate in parallel, that benefit is still there. CoreCLR also uses only one timer thread to serve all timer requests from TimerQueue.

@filipnavara
Copy link
Member

Only one thread was created and the TimerQueue ID was being ignored.

The ID was not ignored, it just was not necessary since the instance variables could be used in the pure managed code implementation. The TimerQueue objects were instantiated here:

private static TimerQueue[] CreateTimerQueues()
{
var queues = new TimerQueue[Environment.ProcessorCount];
for (int i = 0; i < queues.Length; i++)
{
queues[i] = new TimerQueue(i);
}
return queues;
}

Each TimerQueue instance then already had a reference to it's own queue in the instance variables and SetTimer in Timer.Unix.cs started the thread for that queue.

There isn't much benefit to using multiple threads, the goal of the original change was to split the queues such that each queue gets a separate callback on the thread pool and each queue has its own lock so they can operate in parallel, that benefit is still there.

Ok, for reference: dotnet/coreclr#14527.

CoreCLR also uses only one timer thread to serve all timer requests from TimerQueue.

This is the part I missed. I saw the code in AppDomainTimerNative::CreateAppDomainTimer which lead me to believe that more threads are actually created. However, ultimately it's all folded back into one thread in
https://github.com/dotnet/coreclr/blob/eb1aa094dbd32f98fd4ce5dead9f27f9c78a32c5/src/vm/win32threadpool.cpp#L4429-L4521.

I am not sure if the change back to one thread is actually beneficial, but if it matches CoreCLR then it's probably ok. Would be nice to have a benchmark, there is a simple one in dotnet/coreclr#20302.

@kouvel
Copy link
Contributor Author

kouvel commented Feb 22, 2019

SetTimer in Timer.Unix.cs started the thread for that queue

It only started one thread for all TimerQueues because s_timerEvent is static, actually the starting of the thread also had race issues (which were introduced after multiple queues were introduced) that may cause multiple threads to be started, possibly multiple threads for the same queue, and very possibility not starting threads for other queues. See the test case I added in Threading.cs, which fails reliably before the fix.

It could be fixed with fewer changes and more threads (changing some static variables to instance variables, and some such), and it would still be unfortunate to have 32 timer threads on a 32-core machine all for just the purpose of a very small improvement at best. Timers can only request to tick once per millisecond, which is already long, and the resolution is actually about 15+ ms on Windows, any gain with having multiple threads serving those timer requests would be squashed just by the timer resolution.

@filipnavara
Copy link
Member

It only started one thread for all TimerQueues because s_timerEvent is static

Ah, that is a bug I missed. It should have been changed to an instance variable. It is called from under lock (FireNextTimers -> lock (this) -> EnsureTimerFiresBy -> SetTimer), so other race conditions were not really a concern there.

it would still be unfortunate to have 32 timer threads on a 32-core machine all for just the purpose of a very small improvement at best.

I'm not convinced that the improvement is so small for a case where the timers are heavily used. However, I do believe that for the common case it will likely be overkill to have multiple threads.

There are couple other small concerns I have with the change so it would have been nice if it was split off into separate PR.

@kouvel kouvel changed the title Implement APIs for some threading metrics (CoreRT), fix Timer on Unixes Implement APIs for some threading metrics (CoreRT) Mar 6, 2019
@kouvel
Copy link
Contributor Author

kouvel commented Mar 7, 2019

The RT implementation in latest commit is broken, will fix and retest

@kouvel
Copy link
Contributor Author

kouvel commented Mar 7, 2019

Updated perf numbers are here

@kouvel
Copy link
Contributor Author

kouvel commented Mar 18, 2019

Ping for review please

@kouvel
Copy link
Contributor Author

kouvel commented Mar 18, 2019

@dotnet-bot test OSX10.12 Debug and CoreFX tests

@kouvel
Copy link
Contributor Author

kouvel commented Mar 20, 2019

The API review was approved. Couple of pending items is:

  • Whether to expose PendingLocalWorkItemCount and PendingGlobalWorkItemCount. I have asked for more info on whether they would be useful, by default I'll go ahead and remove them
  • Whether we want to expose CompletedWorkItemCount at all.
    • I think that's the most expensive one to track of the lot, as it involves a bit of extra work per work item
    • It is possible to mostly eliminate the extra cost for managed work items in CoreCLR and CoreRT. Work items that may have some extra cost are timer callbacks in CoreRT Windows and IO completion work items in CoreCLR Windows.
    • The info would enable easily determining work item throughput and whether the thread pool is stalled

jkotas pushed a commit that referenced this pull request Mar 29, 2019
…ad (#7071)

* Change Timer implementation on Unixes to use only one scheduling thread

- Separated from #7066

* Address feedback from #7066

* Remove reference to s_lock

* Reduce work inside lock

* Move _id

* Fix duplicate timers in scheduled timer list, move info to TimerQueue
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/coreclr that referenced this pull request Mar 29, 2019
…ad (dotnet#7071)

* Change Timer implementation on Unixes to use only one scheduling thread

- Separated from dotnet/corert#7066

* Address feedback from dotnet/corert#7066

* Remove reference to s_lock

* Reduce work inside lock

* Move _id

* Fix duplicate timers in scheduled timer list, move info to TimerQueue

Signed-off-by: dotnet-bot <[email protected]>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/mono that referenced this pull request Mar 29, 2019
…ad (mono#7071)

* Change Timer implementation on Unixes to use only one scheduling thread

- Separated from dotnet/corert#7066

* Address feedback from dotnet/corert#7066

* Remove reference to s_lock

* Reduce work inside lock

* Move _id

* Fix duplicate timers in scheduled timer list, move info to TimerQueue

Signed-off-by: dotnet-bot <[email protected]>
marek-safar pushed a commit to mono/mono that referenced this pull request Mar 29, 2019
…ad (#7071)

* Change Timer implementation on Unixes to use only one scheduling thread

- Separated from dotnet/corert#7066

* Address feedback from dotnet/corert#7066

* Remove reference to s_lock

* Reduce work inside lock

* Move _id

* Fix duplicate timers in scheduled timer list, move info to TimerQueue

Signed-off-by: dotnet-bot <[email protected]>
jkotas pushed a commit to dotnet/coreclr that referenced this pull request Mar 29, 2019
…ad (#7071)

* Change Timer implementation on Unixes to use only one scheduling thread

- Separated from dotnet/corert#7066

* Address feedback from dotnet/corert#7066

* Remove reference to s_lock

* Reduce work inside lock

* Move _id

* Fix duplicate timers in scheduled timer list, move info to TimerQueue

Signed-off-by: dotnet-bot <[email protected]>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request Mar 29, 2019
…ad (dotnet#7071)

* Change Timer implementation on Unixes to use only one scheduling thread

- Separated from dotnet/corert#7066

* Address feedback from dotnet/corert#7066

* Remove reference to s_lock

* Reduce work inside lock

* Move _id

* Fix duplicate timers in scheduled timer list, move info to TimerQueue

Signed-off-by: dotnet-bot <[email protected]>
stephentoub pushed a commit to dotnet/corefx that referenced this pull request Mar 30, 2019
…ad (#7071)

* Change Timer implementation on Unixes to use only one scheduling thread

- Separated from dotnet/corert#7066

* Address feedback from dotnet/corert#7066

* Remove reference to s_lock

* Reduce work inside lock

* Move _id

* Fix duplicate timers in scheduled timer list, move info to TimerQueue

Signed-off-by: dotnet-bot <[email protected]>
buyaa-n pushed a commit to buyaa-n/coreclr that referenced this pull request Apr 1, 2019
…ad (dotnet#7071)

* Change Timer implementation on Unixes to use only one scheduling thread

- Separated from dotnet/corert#7066

* Address feedback from dotnet/corert#7066

* Remove reference to s_lock

* Reduce work inside lock

* Move _id

* Fix duplicate timers in scheduled timer list, move info to TimerQueue

Signed-off-by: dotnet-bot <[email protected]>
@kouvel
Copy link
Contributor Author

kouvel commented Apr 2, 2019

The perf difference from adding CompletedWorkItemCount seems to be very small, so for now I'll consider that it can reasonably be added

@kouvel
Copy link
Contributor Author

kouvel commented Apr 2, 2019

I have updated to remove local/global versions of PendingWorkItemCount. They can be added in the future if they would be useful. I think all three PRs (CoreCLR, CoreRT, and CoreFX) are now ready for review. @jkotas / @stephentoub, would you be able to take a look?

{
get
{
// Make sure up-to-date thread-local node state is visible to this thread
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be needed.

{
get
{
// Make sure up-to-date thread-local node state is visible to this thread
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be needed.

LinkedSlot linkedSlot = _linkedSlot;
int id = ~_idComplement;
if (id == -1)
if (id == -1 || linkedSlot == null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fixing a bug when _linkedSlot? Do we need a test for it?

Copy link
Contributor Author

@kouvel kouvel Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a possibility that after Dispose is called and another thread calls this method, because there are no restrictions on the loads here, the other thread could read an old value for _idComplement (where id != 0) and a new value for _linkedSlot (== null), then it would throw NullReferenceException instead of ObjectDisposedException. I didn't see anything that would prevent that, and thought it would be difficult to repro but it seems to be fairly easy, will add a test to the CoreFX PR.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@kouvel
Copy link
Contributor Author

kouvel commented Apr 23, 2019

Rebased

@jkotas
Copy link
Member

jkotas commented Apr 23, 2019

shared\System\Threading\ThreadLocal.cs(457,37): error CS8600: Converting null literal or possible null value to non-nullable type.
shared\System\Threading\ThreadLocal.cs(466,31): error CS8600: Converting null literal or possible null value to non-nullable type.
shared\System\Threading\ThreadLocal.cs(466,82): error CS8600: Converting null literal or possible null value to non-nullable type.
shared\System\Threading\ThreadLocal.cs(485,41): error CS8600: Converting null literal or possible null value to non-nullable type.
shared\System\Threading\ThreadLocal.cs(493,35): error CS8600: Converting null literal or possible null value to non-nullable type.
shared\System\Threading\ThreadLocal.cs(493,86): error CS8600: Converting null literal or possible null value to non-nullable type.

@jkotas jkotas merged commit d068307 into dotnet:master Apr 23, 2019
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/coreclr that referenced this pull request Apr 23, 2019
* Implement APIs for some threading metrics (CoreRT)

- API review: https://github.com/dotnet/corefx/issues/35500
- May depend on dotnet#22754

* Use thread-locals for counting, use finalizer instead of runtime to detect thread exit

* Don't let the count properties throw OOM

* Remove some flushes

Signed-off-by: dotnet-bot <[email protected]>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request Apr 23, 2019
* Implement APIs for some threading metrics (CoreRT)

- API review: https://github.com/dotnet/corefx/issues/35500
- May depend on dotnet/coreclr#22754

* Use thread-locals for counting, use finalizer instead of runtime to detect thread exit

* Don't let the count properties throw OOM

* Remove some flushes

Signed-off-by: dotnet-bot <[email protected]>
stephentoub pushed a commit to dotnet/corefx that referenced this pull request Apr 23, 2019
* Implement APIs for some threading metrics (CoreRT)

- API review: https://github.com/dotnet/corefx/issues/35500
- May depend on dotnet/coreclr#22754

* Use thread-locals for counting, use finalizer instead of runtime to detect thread exit

* Don't let the count properties throw OOM

* Remove some flushes

Signed-off-by: dotnet-bot <[email protected]>
jkotas pushed a commit to dotnet/coreclr that referenced this pull request Apr 23, 2019
* Implement APIs for some threading metrics (CoreRT)

- API review: https://github.com/dotnet/corefx/issues/35500
- May depend on #22754

* Use thread-locals for counting, use finalizer instead of runtime to detect thread exit

* Don't let the count properties throw OOM

* Remove some flushes

Signed-off-by: dotnet-bot <[email protected]>
stephentoub pushed a commit to dotnet/corefx that referenced this pull request May 9, 2019
* Expose and test APIs for some threading metrics (CoreFX)

- API review: https://github.com/dotnet/corefx/issues/35500
- Depends on dotnet/coreclr#22754, dotnet/corert#7066

* Separate and expose pending local vs global work item count

* Remove local/global variants of PendingWorkItemCount

* Remove unrelated test

* Add test for a fix to ThreadLocal.Values property throwing NullReferenceException when disposed

Fix is in dotnet/corert#7066

* Fix build

* Fix test

* Add API compat baselines for uapaot

* Fix test

* Use RemoteExecutor for MetricsTest

* Address feedback
@kouvel kouvel deleted the ThreadMetricsRt branch April 3, 2023 21:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants