Skip to content

Conversation

@jerryshao
Copy link
Contributor

Currently received data WAL is not deleted after the timeout, this will make the data accumulated in HDFS. Here add an Akka message to notify the ReceiverSupervisorImpl to clean up the file accordingly.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25517 has started for PR 4037 at commit 2736fd1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25517 has finished for PR 4037 at commit 2736fd1.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25517/
Test FAILed.

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25531 has started for PR 4037 at commit 2736fd1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25531 has finished for PR 4037 at commit 2736fd1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25531/
Test PASSed.

@huitseeker
Copy link
Contributor

I'll just link to my comment on the JIRA for this PR.

Edit: Not tested, but the code looks good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since each such attempt can delete multiple batches (any batches older than the threshold time), its better to name this DeleteOldBatches

@tdas
Copy link
Contributor

tdas commented Jan 22, 2015

I took your PR and augmented it to submit a new PR #4149 . It fixes a very subtle bug in this PR and adds unit test.

asfgit pushed a commit that referenced this pull request Jan 22, 2015
This is a refactored fix based on jerryshao 's PR #4037
This enabled deletion of old WAL files containing the received block data.
Improvements over #4037
- Respecting the rememberDuration of all receiver streams. In #4037, if there were two receiver streams with multiple remember durations, the deletion would have delete based on the shortest remember duration, thus deleting data prematurely for the receiver stream with longer remember duration.
- Added unit test to test creation of receiver WAL, automatic deletion, and respecting of remember duration.

jerryshao I am going to merge this ASAP to make it 1.2.1 Thanks for the initial draft of this PR. Made my job much easier.

Author: Tathagata Das <[email protected]>
Author: jerryshao <[email protected]>

Closes #4149 from tdas/SPARK-5147 and squashes the following commits:

730798b [Tathagata Das] Added comments.
c4cf067 [Tathagata Das] Minor fixes
2579b27 [Tathagata Das] Refactored the fix to make sure that the cleanup respects the remember duration of all the receiver streams
2736fd1 [jerryshao] Delete the old WAL log periodically

(cherry picked from commit 3027f06)
Signed-off-by: Tathagata Das <[email protected]>
asfgit pushed a commit that referenced this pull request Jan 22, 2015
This is a refactored fix based on jerryshao 's PR #4037
This enabled deletion of old WAL files containing the received block data.
Improvements over #4037
- Respecting the rememberDuration of all receiver streams. In #4037, if there were two receiver streams with multiple remember durations, the deletion would have delete based on the shortest remember duration, thus deleting data prematurely for the receiver stream with longer remember duration.
- Added unit test to test creation of receiver WAL, automatic deletion, and respecting of remember duration.

jerryshao I am going to merge this ASAP to make it 1.2.1 Thanks for the initial draft of this PR. Made my job much easier.

Author: Tathagata Das <[email protected]>
Author: jerryshao <[email protected]>

Closes #4149 from tdas/SPARK-5147 and squashes the following commits:

730798b [Tathagata Das] Added comments.
c4cf067 [Tathagata Das] Minor fixes
2579b27 [Tathagata Das] Refactored the fix to make sure that the cleanup respects the remember duration of all the receiver streams
2736fd1 [jerryshao] Delete the old WAL log periodically
@tdas
Copy link
Contributor

tdas commented Jan 22, 2015

Mind closing this PR?

@jerryshao
Copy link
Contributor Author

OK.

@jerryshao jerryshao closed this Jan 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants