-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-19543. [ABFS][FnsOverBlob] Remove Duplicates from Blob Endpoint Listing Across Iterations #7632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t Listing Across Iterations (apache#7614) Contributed by Anuj Modi Reviewed by Anmol Asrani, Manish Bhatt, Manika Joshi Signed off by Anuj Modi<[email protected]>
|
🎊 +1 overall
This message was automatically generated. |
============================================================
|
PR in trunk: #7614
Commit CP'd: 810c42f
JIRA: https://issues.apache.org/jira/browse/HADOOP-19543
Description of PR
On FNS-Blob, the List Blobs API is known to return duplicate entries for non-empty explicit directories. One entry corresponds to the directory itself, and another corresponds to the marker blob that the driver internally creates and maintains to mark that path as a directory. We already know about this behaviour, and it was handled to remove such duplicate entries from the set of entries that were returned as part of current list iterations.
Due to a possible partition split, if such duplicate entries happen to be returned in separate iterations, there is no handling on this, and the caller might get back the result with duplicate entries, as happened in this case. The logic to remove duplicates was designed before the realization of the partition split.
This PR fixes this bug
How was this patch tested?
A new test for the failing scenario was added and existing test suite was ran to validate changes across all combinations.