Skip to content

Conversation

@AartBluestoke
Copy link

When there are continuations to the container scan, the later pages should not advance the timestamp past when the first page of the scan occurs.

Without this change an update to a file on the first page that occurs while the later API calls are occurring might be missed.

old process:
0. old threshold

  1. write file v1.
  2. begin scan
  3. read file v1.
  4. write file v2,
  5. write occurs to another,
  6. fetch next batch of results from the container scan
  7. find 'another, t5'
  8. highlighted line of code sets the 'high water mark' to t5, skipping over , file v2 which wrote at t4.
  9. container scan ends
  10. next scan begins at t5, missing the update at t4

the correct 'high water mark' is the minimum of t5 and t2, when the scan began

new process:
0. old threshold

  1. write file v1.
  2. begin scan
  3. find 'file v1, t1', set high water mark to v1
  4. write file v2,
  5. write occurs to another,
  6. fetch next batch of results from the container scan
  7. find 'another, t5'
  8. the high water mark IS NOT updated
  9. container scan ends
  10. next scan begins at t1, finding the file written at t4.
  11. the scan finds the file written at t5 again, but rejects it via ETAG duplication.

AartBluestoke and others added 2 commits August 25, 2023 09:27
…hould not advance the timestamp past when the first page of the scan occurs.

Without this change an update to a file on the first page that occurs while the later API calls are occurring might be missed.
@AartBluestoke
Copy link
Author

@microsoft-github-policy-service agree

@AartBluestoke
Copy link
Author

PR raised in response to bug discussed in azure support ticket TrackingID#2308210030000318

@AartBluestoke AartBluestoke changed the title Bugs/no scan from future Functions - BlobTrigger container scan should not skip past changes made during the scan Aug 25, 2023
// if starting the cycle, reset the sweep time
if (continuationToken == null)
{
containerScanInfo.CurrentScanBeginTime = DateTime.UtcNow;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the best name here "Scan" or "Sweep"? (both words are used interchangeably in this code)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant