Backup repo cache design #9148

Lyndon-Li · 2025-08-05T10:57:30Z

Fix issue #7725, add design for backup repo cache configuration

codecov · 2025-08-05T11:17:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.66%. Comparing base (1a52994) to head (2de5a5c).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #9148   +/-   ##
=======================================
  Coverage   59.66%   59.66%           
=======================================
  Files         382      382           
  Lines       43900    43900           
=======================================
  Hits        26194    26194           
  Misses      16161    16161           
  Partials     1545     1545

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

shubham-pampattiwar · 2025-08-05T17:12:33Z

/cc @mpryc @msfrucht

msfrucht · 2025-08-05T22:01:29Z

Overall, I think this design will be effective.

Another reason under Background.

On nodes where ephemeral-storage is shared with the OS volume. A full OS volume causes the node starts to undergo behavioral issues such as Pod removal and create failures. Kubelet operation degrades significantly if not outright begins to fail completely.

One area this design I would like to see if handling of default storage class labeling. But this also conflicts with golang conversion of omitempty fields to empty string. Need some way to tell the difference between cacheStorageClass: "" and the key is missing altogether.

Will CacheVolumeInfo.Limit be allowed to be set through the maintenance configmap as well? This seems to be a valuable addition given the algorithmic size calculation to prevent excessively sized volumes. I didn't see much explaining its purpose in the design. But it could also be this already exists in Velero and I just can't find a reference to it in the code.

With no mention of AccessModes, I imagine one of the requirements will be for ethe StorageClass to support ReadWriteOnce? Though I don't know of any StorageClass with ReadWriteMany support that doesn't also support ReadWriteOnce.

Finally, the `cacheVolumeSize` will be rounded up to GB considering the UX friendlyness, storage friendlyness and management friendlyness.

I know this is nitpick, but should be clarified. This should be in GiB, not GB. A number of csi drivers round up to the nearest Gi when suffix 'G' is used in resource request. (Even if this results in silliness of 1G -> 954Mi.) That would result in the allocated size may be larger than requested.

Lyndon-Li · 2025-08-06T03:03:40Z

Will CacheVolumeInfo.Limit be allowed to be set through the maintenance configmap as well? This seems to be a valuable addition given the algorithmic size calculation to prevent excessively sized volumes. I didn't see much explaining its purpose in the design. But it could also be this already exists in Velero and I just can't find a reference to it in the code.

Yes, this one is the existing cacheLimitMB in the backup repo configmap. See this statement in the doc:
**Limit**: The limit of the cache data, that is represented by cacheLimitMB``

Lyndon-Li · 2025-08-06T03:16:55Z

One area this design I would like to see if handling of default storage class labeling. But this also conflicts with golang conversion of omitempty fields to empty string. Need some way to tell the difference between cacheStorageClass: "" and the key is missing altogether.

The configurations in backup repo configmap are converted to golang map. So we cannot and don't need to set omitempty tag. That is, cacheStorageClass: "" is the same as cacheStorageClass is absent.

As the current design, if storageclass is not configured, cache volume will NOT be created:
**repository-type-3**: This is not a valid configuration because the storage class is absent. Velero gives up creating a cache volume of all restores of this backup repository.
I incline to keep the current design, we need users to explicitly set a storageclass, the default one may not as the expectation to take the cache volumes. But let me know if you have other strong consideration.

Lyndon-Li · 2025-08-06T03:34:31Z

On nodes where ephemeral-storage is shared with the OS volume. A full OS volume causes the node starts to undergo behavioral issues such as Pod removal and create failures. Kubelet operation degrades significantly if not outright begins to fail completely.

Yes, I have added the details into the background section. The cache volume is right to solve the problem of running out of the node's root file system, as the pod's root file system problem is primarily solved by the existing cacheLimitMB.

Lyndon-Li · 2025-08-06T03:38:29Z

With no mention of AccessModes, I imagine one of the requirements will be for ethe StorageClass to support ReadWriteOnce? Though I don't know of any StorageClass with ReadWriteMany support that doesn't also support ReadWriteOnce.

Yes, cache PVC will be ReadWriteOnce and in FileSystem mode, the storage class should support it. Added the same into the design doc.

Lyndon-Li · 2025-08-06T03:41:33Z

This should be in GiB, not GB. A number of csi drivers round up to the nearest Gi when suffix 'G' is used in resource request. (Even if this results in silliness of 1G -> 954Mi.) That would result in the allocated size may be larger than requested.

Yes, it should be in GiB. Modified the design.

msfrucht · 2025-08-06T16:44:23Z

One area this design I would like to see if handling of default storage class labeling. But this also conflicts with golang conversion of omitempty fields to empty string. Need some way to tell the difference between cacheStorageClass: "" and the key is missing altogether.

The configurations in backup repo configmap are converted to golang map. So we cannot and don't need to set omitempty tag. That is, cacheStorageClass: "" is the same as cacheStorageClass is absent.

As the current design, if storageclass is not configured, cache volume will NOT be created: **repository-type-3**: This is not a valid configuration because the storage class is absent. Velero gives up creating a cache volume of all restores of this backup repository. I incline to keep the current design, we need users to explicitly set a storageclass, the default one may not as the expectation to take the cache volumes. But let me know if you have other strong consideration.

It was not a strong consideration. We use scripting to configure Velero so our needs don't correspond to the average user. It was in consideration of general ease-of-use.

If you prefer explicit storage class that is not a problem for us.

design/backup-repo-cache-volume.md

msfrucht · 2025-08-07T18:07:01Z

@Lyndon-Li Went back over our applications and missed one that requires a design change to work with because the application versions that causes this problem is an older version than current.

One of our applications that we support, depending on workload causes significant variation in behavior. Sometimes generating hundreds of PVCs (our max seen was 386 in a single namespace), and the one that impacts this design, over a hundred million files on a single CephFS ReadWriteMany volume, which is similar in behavior to NFS.

The vast majority of these files were small, below the kopia deduplication threshold. Any system that needed to access or iterate all the files rather than the minority at a time like the application behavior became strained, including backup and restore, OpenShift SELinux relabeling, and Ceph. Newer versions corrected this problem by shoving the files into a database.

This causes kopia index cache to grow enormously in size, up to ~25-35GB which is not included in the limits for kopia cache limit options. As we found out, the cache limit options affects the size of the data and metadata cache, but not the index cache.

The small file size meant the total data backed up was relatively small. And the snapshot size was small as well relatively to what would be normally expected of such a high file count.

The consequence is that the formula in the design breaks down in this unusual situation:
cacheVolumeSize = (min(backupSize, cacheLimit) >= residentThreshold ? min(backupSize, cacheLimit) : 0) * inflationPercentage
Specifically, this section:
(min(backupSize, cacheLimit)

The design allows for forcing a PVC cache, which is good for this situation. But not the size of the PVC to deal with the unusually large kopia index cache as min(backupSize, cacheLimit) : 0) will start from an unusually small size compared to most applications.

We believe this can be addressed by exposing the InflationPercentage as a configurable value. Though the default is ok for most situations. Some quick math and looking at existing application maintenance Jobs showed only this one situation was outside of the formula. I am completely open to other options as well as understand there may be no interest in addressing such an unusual situation.

Lyndon-Li · 2025-08-08T05:01:39Z

As we found out, the cache limit options affects the size of the data and metadata cache, but not the index cache

I think the index cache is also constraint by the cacheLimit, see https://github.com/kopia/kopia/blob/3273e2f69ca94495a6231dbf1c085d36d5ccdbb7/repo/content/committed_read_manager.go#L440.

For Velero, the default cachLimit is 5GB, so the cache size should not exceed that size too much, even in the large index case as you mentioned.

Previously there is a bug from Velero side, the cache's hard limit is not appropriately set, as a result, the cache size could extremely exceed the configured cacheLimitMB or the default value 5GB.

Lyndon-Li · 2025-08-08T08:53:21Z

@msfrucht

The vast majority of these files were small, below the kopia deduplication threshold. Any system that needed to access or iterate all the files rather than the minority at a time like the application behavior became strained, including backup and restore
The design allows for forcing a PVC cache, which is good for this situation. But not the size of the PVC to deal with the unusually large kopia index

Yes, Kopia always keeps full version of indexes in memory and refresh them periodically, e.g., every 15min, from cache.

Previously, I intended to avoid elaborating on the index cache data, in this design, because 1, keeping full index cache from client side is not normal for backup repositories; 2, even for Kopia, when this case happens (huge number of small files result in huge number of indexes), users probably don't have to worry about disk occupation for cache, but the memory usage --- they need to prepare huge memory for the data mover pod and they most probably cannot run concurrent data mover pods because they don't have enough memory, then the precondition/requirement for this design is gone.
For 2, take what you mentioned for example, if the index blobs are 25-35GB, much more memory is required, because the indexes persisted in the index blobs are in binary format, so compacted, but in memory they are represented by complex golang structs; and what is worse, the indexes may be duplicated in memory during execution.

But since we've come here, let me add below things into the design doc:

Explain that backup repository metadata (a general name for non payload data, including Kopia's index data) may take majority of the cache data, so calculating cache size by backup size is not always scientific
Claim that the current design doesn't consider the case that repository metadata takes the majority of case data, leave it for future enhancement when the requirement is clear, for which we need to support cache volumes for both backup and restore.

Lyndon-Li · 2025-08-08T11:09:00Z

The design allows for forcing a PVC cache, which is good for this situation. But not the size of the PVC to deal with the unusually large kopia index cache as min(backupSize, cacheLimit) : 0)

It may not a good choice to calculate the cacheVolumeSize by cacheVolumeSize = (min(backupSize, cacheLimit) >= residentThreshold ? min(backupSize, cacheLimit) : 0) * inflationPercentage, even in normal cases.
Let me refactor the design and get you back.

Lyndon-Li · 2025-08-12T07:26:41Z

@msfrucht
I have updated the doc, please continue the review.

design/backup-repo-cache-volume.md

msfrucht · 2025-08-13T16:26:36Z

@msfrucht I have updated the doc, please continue the review.

Thanks for the addition effort and revision.

This forumula is more flexible without loss of expressive power in my opinion.

cacheVolumeSize = (backupSize > residentThreshold ? limit : 0) * inflationPercentage

Even if we had some issue with the unusual nature of the backup size. This could be dealt with by either adjusting the cacheLimit globally or per affected namespace. The cacheLimit might be over-reserved for data and metadata in the extreme case I found in our history, but that wasn't the limiting factor and so becomes irrelevant. As far as I am aware, this particular case was the only one like that.

We have recommended previously in situations like that to backup Ceph rather than the application itself, as CephFS is not eligible for volume Block volumeMode data transfer, the preferred method for excessive file count volumes.

Said user had the memory to spare and didn't mind and was quite adamant on the matter.. Their focus was on reliability, execution time, management flexibility, and resource use way down the list of priorities.

Signed-off-by: Lyndon-Li <[email protected]>

github-actions bot added the Area/Design Design Documents label Aug 5, 2025

Lyndon-Li force-pushed the backup-repo-cache-design branch from 3f93316 to f10e021 Compare August 5, 2025 11:02

github-actions bot added the has-changelog label Aug 5, 2025

Lyndon-Li force-pushed the backup-repo-cache-design branch 2 times, most recently from fec1dc8 to 349f35b Compare August 5, 2025 11:08

Lyndon-Li force-pushed the backup-repo-cache-design branch from 349f35b to 0171e1e Compare August 6, 2025 03:32

Lyndon-Li force-pushed the backup-repo-cache-design branch from 0171e1e to e12db73 Compare August 6, 2025 03:42

Lyndon-Li marked this pull request as ready for review August 6, 2025 03:45

github-actions bot requested review from shubham-pampattiwar and ywk253100 August 6, 2025 03:45

github-actions bot assigned Lyndon-Li Aug 6, 2025

Lyndon-Li requested review from blackpiglet, reasonerjt and sseago August 6, 2025 03:46

Lyndon-Li force-pushed the backup-repo-cache-design branch from e12db73 to c8b9ec0 Compare August 6, 2025 07:10

Lyndon-Li requested a review from kaovilai August 7, 2025 08:45

kaovilai previously approved these changes Aug 7, 2025

View reviewed changes

design/backup-repo-cache-volume.md Outdated Show resolved Hide resolved

Lyndon-Li dismissed kaovilai’s stale review via 026cf9c August 8, 2025 08:57

Lyndon-Li force-pushed the backup-repo-cache-design branch 3 times, most recently from 73d04a2 to 65e7182 Compare August 8, 2025 10:42

Lyndon-Li force-pushed the backup-repo-cache-design branch 3 times, most recently from b481725 to b36b86d Compare August 12, 2025 07:24

Lyndon-Li requested a review from kaovilai August 12, 2025 07:30

kaovilai reviewed Aug 13, 2025

View reviewed changes

Lyndon-Li force-pushed the backup-repo-cache-design branch from 1b1d966 to 5d2dad2 Compare August 13, 2025 03:20

Lyndon-Li requested a review from kaovilai August 13, 2025 03:21

kaovilai previously approved these changes Aug 13, 2025

View reviewed changes

Lyndon-Li dismissed kaovilai’s stale review via 791c088 August 13, 2025 10:13

Lyndon-Li force-pushed the backup-repo-cache-design branch from 5d2dad2 to 791c088 Compare August 13, 2025 10:13

kaovilai mentioned this pull request Aug 13, 2025

Concurrent backup design doc #8991

Merged

Lyndon-Li requested a review from kaovilai August 14, 2025 02:58

kaovilai previously approved these changes Aug 14, 2025

View reviewed changes

Lyndon-Li dismissed kaovilai’s stale review via 7f2d5f2 August 15, 2025 09:54

Lyndon-Li force-pushed the backup-repo-cache-design branch from a986ac4 to 7f2d5f2 Compare August 15, 2025 09:54

backup repo cache design

133db85

Signed-off-by: Lyndon-Li <[email protected]>

Lyndon-Li force-pushed the backup-repo-cache-design branch from 7f2d5f2 to 133db85 Compare August 15, 2025 09:55

Merge branch 'main' into backup-repo-cache-design

2de5a5c

kaovilai approved these changes Aug 15, 2025

View reviewed changes

Backup repo cache design #9148

Are you sure you want to change the base?

Backup repo cache design #9148

Uh oh!

Conversation

Lyndon-Li commented Aug 5, 2025

Uh oh!

codecov bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shubham-pampattiwar commented Aug 5, 2025

Uh oh!

msfrucht commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lyndon-Li commented Aug 6, 2025

Uh oh!

Lyndon-Li commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lyndon-Li commented Aug 6, 2025

Uh oh!

Lyndon-Li commented Aug 6, 2025

Uh oh!

Lyndon-Li commented Aug 6, 2025

Uh oh!

msfrucht commented Aug 6, 2025

Uh oh!

Uh oh!

msfrucht commented Aug 7, 2025

Uh oh!

Lyndon-Li commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lyndon-Li commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lyndon-Li commented Aug 8, 2025

Uh oh!

Lyndon-Li commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msfrucht commented Aug 13, 2025

Uh oh!

Uh oh!

codecov bot commented Aug 5, 2025 •

edited

Loading

msfrucht commented Aug 5, 2025 •

edited

Loading

Lyndon-Li commented Aug 6, 2025 •

edited

Loading

Lyndon-Li commented Aug 8, 2025 •

edited

Loading

Lyndon-Li commented Aug 8, 2025 •

edited

Loading