-
Notifications
You must be signed in to change notification settings - Fork 5.1k
[Storage] [WebJobs] LogScan new blobs between the start of polling and the LMT of the last LogScan #53767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
basically the same change as was proposed at Azure/azure-webjobs-sdk#3014 :)
if it is possible for storage to return blob 3 but not 2, then as written it is correct, we can't advance timepoint past 1. |
|
TODO: unit tests, integrated tests, changelog |
Ah nice! Sorry that I didn't see that PR. Looks like it's in the Azure/azure-webjobs-sdk repo and should have be in this repo! Looks like that didn't fall under my radar cause I don't work in that repo. But thanks for the contribution! |
| ### Breaking Changes | ||
|
|
||
| ### Bugs Fixed | ||
| - Fixed a bug where polling for new blobs fails to detect newly created blobs when the List Blobs/Get Blobs operation requires multiple requests. If a blob is added or modified after LogScan begins, and is listed during the later portion of the listing, any blobs changed between the listing start time and the last modified timestamp of blobs found later will not be flagged as new in the subsequent LogScan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not my favorite changelog message, any rewording suggestions is welcomed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug in blob polling logic where newly created or modified blobs could be missed during multi-page blob listing operations. The fix introduces a PollingStartTime timestamp to ensure only blobs modified before polling began are considered in the current scan cycle, preventing gaps in blob detection when listings span multiple pages.
Key Changes:
- Added
PollingStartTimeproperty to track when each polling cycle begins - Modified blob detection logic to filter blobs based on the polling start time
- Added comprehensive test coverage for the multi-page continuation token scenario
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
ContainerScanInfo.cs |
Adds PollingStartTime property to track when polling began |
ScanBlobScanLogHybridPollingStrategy.cs |
Updates polling logic to set PollingStartTime and filter blobs based on this timestamp |
ScanBlobScanLogHybridPollingStrategyTests.cs |
Adds test for continuation token scenario and updates existing test timestamp type |
TestAsyncPageableWithContinuationToken.cs |
New test helper class for simulating paged blob listing with continuation tokens |
CHANGELOG.md |
Documents the bug fix |
| RunExecuteWithMultiPollingInterval(expectedNames, product, executor, blobItems.Count); | ||
|
|
||
| // Wait 5 seconds to ensure that the blob with LMT after polling started is detected as a new blob. | ||
| Thread.Sleep(5000); |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Thread.Sleep in tests makes them slower and potentially flaky. Since you're setting up blob timestamps in the future (line 449 uses UtcNow.AddSeconds(5)), the test is relying on actual time passing to validate the behavior. Consider redesigning the test to use deterministic timestamps that don't require waiting, or use a time provider abstraction that can be mocked. For example, you could set blob timestamps relative to a fixed point in time rather than DateTimeOffset.UtcNow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might actually look into this, we may have to use Reflection(?) to set the exact start polling time. It will make this test more resilient. If not, we'd have to refactor the strategy itself to make it more testable.
...ebJobs.Extensions.Storage.Blobs/tests/Listeners/ScanBlobScanLogHybridPollingStrategyTests.cs
Outdated
Show resolved
Hide resolved
| ### Breaking Changes | ||
|
|
||
| ### Bugs Fixed | ||
| - Fixed a bug where polling for new blobs fails to detect newly created blobs when the List Blobs/Get Blobs operation requires multiple requests. If a blob is added or modified after LogScan begins, and is listed during the later portion of the listing, any blobs changed between the listing start time and the last modified timestamp of blobs found later will not be flagged as new in the subsequent LogScan. |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The bug description is verbose and could be simplified for better readability. Consider rephrasing to: "Fixed a bug where polling for new blobs fails to detect blobs modified between the start of a multi-page blob listing and the last modified timestamp of blobs found in later pages."
| - Fixed a bug where polling for new blobs fails to detect newly created blobs when the List Blobs/Get Blobs operation requires multiple requests. If a blob is added or modified after LogScan begins, and is listed during the later portion of the listing, any blobs changed between the listing start time and the last modified timestamp of blobs found later will not be flagged as new in the subsequent LogScan. | |
| - Fixed a bug where polling for new blobs fails to detect blobs modified between the start of a multi-page blob listing and the last modified timestamp of blobs found in later pages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this fix details best how this issue is exposed. But if others would rather see it shortened to this, I can do that.
Attempt to resolve issue #51030
Where LogScan fails to detect newly created blobs when the List Blobs/Get Blobs operation requires multiple requests. If a blob is added or modified after LogScan begins and during the later portion of the listing, any blobs changed between the listing start time and the last modified timestamp of blobs found later will not be flagged as new in the subsequent LogScan.