-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Backup repo cache design #9148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Backup repo cache design #9148
Conversation
3f93316
to
f10e021
Compare
fec1dc8
to
349f35b
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #9148 +/- ##
=======================================
Coverage 59.66% 59.66%
=======================================
Files 382 382
Lines 43900 43900
=======================================
Hits 26194 26194
Misses 16161 16161
Partials 1545 1545 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Overall, I think this design will be effective. Another reason under Background. On nodes where ephemeral-storage is shared with the OS volume. A full OS volume causes the node starts to undergo behavioral issues such as Pod removal and create failures. Kubelet operation degrades significantly if not outright begins to fail completely. One area this design I would like to see if handling of default storage class labeling. But this also conflicts with golang conversion of omitempty fields to empty string. Need some way to tell the difference between cacheStorageClass: "" and the key is missing altogether. Will CacheVolumeInfo.Limit be allowed to be set through the maintenance configmap as well? This seems to be a valuable addition given the algorithmic size calculation to prevent excessively sized volumes. I didn't see much explaining its purpose in the design. But it could also be this already exists in Velero and I just can't find a reference to it in the code. With no mention of AccessModes, I imagine one of the requirements will be for ethe StorageClass to support ReadWriteOnce? Though I don't know of any StorageClass with ReadWriteMany support that doesn't also support ReadWriteOnce.
I know this is nitpick, but should be clarified. This should be in GiB, not GB. A number of csi drivers round up to the nearest Gi when suffix 'G' is used in resource request. (Even if this results in silliness of 1G -> 954Mi.) That would result in the allocated size may be larger than requested. |
Yes, this one is the existing |
The configurations in backup repo configmap are converted to golang As the current design, if storageclass is not configured, cache volume will NOT be created: |
349f35b
to
0171e1e
Compare
Yes, I have added the details into the background section. The cache volume is right to solve the problem of running out of the node's root file system, as the pod's root file system problem is primarily solved by the existing |
Yes, cache PVC will be |
Yes, it should be in GiB. Modified the design. |
0171e1e
to
e12db73
Compare
e12db73
to
c8b9ec0
Compare
It was not a strong consideration. We use scripting to configure Velero so our needs don't correspond to the average user. It was in consideration of general ease-of-use. If you prefer explicit storage class that is not a problem for us. |
@Lyndon-Li Went back over our applications and missed one that requires a design change to work with because the application versions that causes this problem is an older version than current. One of our applications that we support, depending on workload causes significant variation in behavior. Sometimes generating hundreds of PVCs (our max seen was 386 in a single namespace), and the one that impacts this design, over a hundred million files on a single CephFS ReadWriteMany volume, which is similar in behavior to NFS. The vast majority of these files were small, below the kopia deduplication threshold. Any system that needed to access or iterate all the files rather than the minority at a time like the application behavior became strained, including backup and restore, OpenShift SELinux relabeling, and Ceph. Newer versions corrected this problem by shoving the files into a database. This causes kopia index cache to grow enormously in size, up to ~25-35GB which is not included in the limits for kopia cache limit options. As we found out, the cache limit options affects the size of the data and metadata cache, but not the index cache. The small file size meant the total data backed up was relatively small. And the snapshot size was small as well relatively to what would be normally expected of such a high file count. The consequence is that the formula in the design breaks down in this unusual situation: The design allows for forcing a PVC cache, which is good for this situation. But not the size of the PVC to deal with the unusually large kopia index cache as We believe this can be addressed by exposing the InflationPercentage as a configurable value. Though the default is ok for most situations. Some quick math and looking at existing application maintenance Jobs showed only this one situation was outside of the formula. I am completely open to other options as well as understand there may be no interest in addressing such an unusual situation. |
I think the index cache is also constraint by the cacheLimit, see https://github.com/kopia/kopia/blob/3273e2f69ca94495a6231dbf1c085d36d5ccdbb7/repo/content/committed_read_manager.go#L440. For Velero, the default cachLimit is 5GB, so the cache size should not exceed that size too much, even in the large index case as you mentioned. Previously there is a bug from Velero side, the cache's hard limit is not appropriately set, as a result, the cache size could extremely exceed the configured |
Yes, Kopia always keeps full version of indexes in memory and refresh them periodically, e.g., every 15min, from cache. Previously, I intended to avoid elaborating on the index cache data, in this design, because 1, keeping full index cache from client side is not normal for backup repositories; 2, even for Kopia, when this case happens (huge number of small files result in huge number of indexes), users probably don't have to worry about disk occupation for cache, but the memory usage --- they need to prepare huge memory for the data mover pod and they most probably cannot run concurrent data mover pods because they don't have enough memory, then the precondition/requirement for this design is gone. But since we've come here, let me add below things into the design doc:
|
73d04a2
to
65e7182
Compare
It may not a good choice to calculate the cacheVolumeSize by |
b481725
to
b36b86d
Compare
@msfrucht |
1b1d966
to
5d2dad2
Compare
5d2dad2
to
791c088
Compare
Thanks for the addition effort and revision. This forumula is more flexible without loss of expressive power in my opinion.
Even if we had some issue with the unusual nature of the backup size. This could be dealt with by either adjusting the cacheLimit globally or per affected namespace. The cacheLimit might be over-reserved for data and metadata in the extreme case I found in our history, but that wasn't the limiting factor and so becomes irrelevant. As far as I am aware, this particular case was the only one like that. We have recommended previously in situations like that to backup Ceph rather than the application itself, as CephFS is not eligible for volume Block volumeMode data transfer, the preferred method for excessive file count volumes. Said user had the memory to spare and didn't mind and was quite adamant on the matter.. Their focus was on reliability, execution time, management flexibility, and resource use way down the list of priorities. |
a986ac4
to
7f2d5f2
Compare
Signed-off-by: Lyndon-Li <[email protected]>
7f2d5f2
to
133db85
Compare
Fix issue #7725, add design for backup repo cache configuration