Always bounce through the threadpool when switching from STA#856
Merged
kennykerr merged 1 commit intomicrosoft:masterfrom Feb 11, 2021
oldnewthing:sta-to-sta
Merged
Always bounce through the threadpool when switching from STA#856kennykerr merged 1 commit intomicrosoft:masterfrom oldnewthing:sta-to-sta
kennykerr merged 1 commit intomicrosoft:masterfrom
oldnewthing:sta-to-sta
Conversation
Previously, we changed the way we switch from STA to NA by bouncing through the threadpool so that we don't run the NA on an STA thread. We extend this behavior to switches from STA to STA, to avoid RO_E_BLOCKED_CROSS_ASTA_CALL exceptions. Interestingly, we add this support by deleting code. We would normally add tests for STA and MAINSTA to the NA test, but since MTA was ruled out by the previous test, and NA+STA+MAINSTA completely cover all the non-MTA possibilities, there's no need to perform any test at all. The fact that we got here means that the destination is NA/STA/MAINSTA. Note also that the first test in this block checks if we are resuming in the same apartment that we are currently in. That test is important, because it avoids us bouncing through the threadpool just to return to our own thread, which would otherwise burn a threadpool thread. Added a new unit test to verify that STA-to-STA transfers release the source STA. The test failed before the change, and passes after. Note however that this introduces a possibly undesirable behavior change: Suppose STA thread "A" hangs and STA thread "B" has many tasks, some of which want to resume on STA thread "A". Under the old behavior, STA thread "B" would run the first task, and then that task will try to switch to STA thread "A" and hang. STA thread "B" would remain hung and be unavailable to process tasks. When STA thread "A" finally wakes up, STA thread "B" also wakes up and resumes doing work. The number of hung threads is capped at the number of STA threads. Under the new behavior, STA thread "B" would run the first task, and then ask a threadpool to resume on STA thread "A". The threadpool thread hangs instead of STA thread "B". STA thread "B" is now available to do other work (yay), but at a cost of a hung threadpool thread. One of the things STA thread "B" might do is create more work items that want to switch to STA thread "A", so we slowly (or maybe quickly) exhaust the threadpool. Mitigation for this worst-case scenario is that we sort of had that problem anyway: If any task on a threadpool thread wants to switch to STA thread "A", it will hang with or without these changes. Since it is common to pass through a threadpool thread at some point, you are likely to exhaust the threadpool even under the old algorithm.
dmachaj
reviewed
Feb 11, 2021
| // processes work, it will unstick the first STA thread. | ||
| // This test verifies that the second STA thread does become available | ||
| // to do work and is not hung waiting for the first STA thread. | ||
| co_await context1; |
Contributor
There was a problem hiding this comment.
So if this were broken (aka before your changes) this co_await would block forever if not for the 1000 timeout on line 85. Is that right?
Member
Author
There was a problem hiding this comment.
Right. My initial version had INFINITE, and the test hung. We don't like it when tests hang, so I had it detect that it had been stuck for 1 second (should be plenty) and declare failure if so.
dmachaj
approved these changes
Feb 11, 2021
kennykerr
approved these changes
Feb 11, 2021
Collaborator
kennykerr
left a comment
There was a problem hiding this comment.
Looks good - thanks Raymond!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In PR #662, we changed the way we switch from STA to NA by bouncing through the threadpool so that we don't run the NA on an STA thread.
We extend this behavior to switches from STA to STA, to avoid RO_E_BLOCKED_CROSS_ASTA_CALL exceptions, which has been identified as the source of some crashes.
Interestingly, we add this support by deleting code. We would normally add tests for STA and MAINSTA to the NA test, but since MTA was ruled out by the previous test, and NA+STA+MAINSTA completely cover all the non-MTA possibilities, there's no need to perform any test at all. The fact that we got here means that the destination is NA/STA/MAINSTA.
Note also that the first test in this block checks if we are resuming in the same apartment that we are currently in. That test is important, because it avoids us bouncing through the threadpool just to return to our own thread, which would otherwise burn a threadpool thread.
Added a new unit test to verify that STA-to-STA transfers release the source STA. The test failed before the change, and passes after.
Note however that this introduces a possibly undesirable behavior change: Suppose STA thread "A" hangs and STA thread "B" has many tasks, some of which want to resume on STA thread "A".
Under the old behavior, STA thread "B" would run the first task, and then that task will try to switch to STA thread "A" and hang. STA thread "B" would remain hung and be unavailable to process tasks. When STA thread "A" finally wakes up, STA thread "B" also wakes up and resumes doing work. The number of hung threads is capped at the number of STA threads.
Under the new behavior, STA thread "B" would run the first task, and then ask a threadpool to resume on STA thread "A". The threadpool thread hangs instead of STA thread "B". STA thread "B" is now available to do other work (yay), but at a cost of a hung threadpool thread. One of the things STA thread "B" might do is create more work items that want to switch to STA thread "A", so we slowly (or maybe quickly) exhaust the threadpool.
Mitigation for this worst-case scenario is that we sort of had that problem anyway: If any task on a threadpool thread wants to switch to STA thread "A", it will hang with or without these changes. Since it is common to pass through a threadpool thread at some point, you are likely to exhaust the threadpool even under the old algorithm.