Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
c0c90d2
Run centos and debian workflows on push and PR
igchor Nov 2, 2021
dbe3fda
Adds createPutToken and switches findEviction
byrnedj Feb 4, 2023
9afcd64
Add memory usage statistics for allocation classes
igchor Jul 6, 2022
eca7d8c
Initial multi-tier support implementation
igchor Sep 28, 2021
664da8d
AC stats multi-tier
byrnedj Jan 17, 2023
3b7bb0c
Tests and fix tier sizing
byrnedj Feb 8, 2023
58e825b
This is the additional multi-tier support needed
guptask Nov 14, 2022
9fc705f
Rolling average alloc latency
guptask Jul 21, 2022
ce0e38a
Rolling average class latency
guptask Jul 21, 2022
e0a8006
MM2Q promotion iterator
byrnedj Aug 9, 2022
bcb2ae2
Multi-tier allocator patch
byrnedj Feb 7, 2023
d4cf1d4
basic multi-tier test based on numa bindings
igchor Dec 30, 2021
6d2fbef
Aadding new configs to hit_ratio/graph_cache_leader_fobj
vinser52 Jan 27, 2022
5bfa1ff
Background data movement for the tiers
byrnedj Oct 21, 2022
1593291
dummy change to trigger container image rebuild
guptask Mar 28, 2023
a171f38
Updated the docker gcc version to 12 (#83)
guptask May 9, 2023
35a17e4
NUMA bindigs support for private memory (#82)
vinser52 May 17, 2023
46d168c
Do not run cachelib-centos-8-5 on PRs (#85)
igchor Jun 6, 2023
7d06531
Add option to insert items to first free tier (#87)
igchor Jun 8, 2023
1521efe
Chained item movement between tiers - sync on the parent item (#84)
byrnedj Jun 28, 2023
3328e4e
edit dockerfile
byrnedj Jul 24, 2023
3c87c49
Track latency of per item eviction/promotion between memory tiers
guptask Jul 28, 2023
795f85b
Update dependencies (#95)
igchor Aug 23, 2023
96d948f
enable DTO build without memcpy changes to cachebench
byrnedj Feb 28, 2024
47d5034
Bckground eviction for multi-tier
byrnedj Feb 28, 2024
efea480
no online eviction option patch
byrnedj Feb 28, 2024
ebfca17
fixes cmake in latest test removal (upstream test build fails - need …
byrnedj May 20, 2024
52618b5
fixes commit for now (should drop once https://github.com/facebook/Ca…
byrnedj May 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Background data movement for the tiers
Part 1.
--------------------------------------
This adds the following:
1. tryPromoteToNextTier. This could go with multi-tier part 2
2. Promotion iterators. This could go with MM2Q promotion iterators patch.

It also enables background workers in the
cache config.

Future changes to the background workers can
be merged with this patch.

Background evictors multi-tier
Part 2.
--------------------------------
This should be rolled into background evictors part 1.

improved bg stats structure and cachebench output
adds the following:
 - approx usage stat
 - evictions / attempts per class

Background evictors multi-tier
Part 3.
--------------------------------
use approximate usage fraction
  • Loading branch information
byrnedj committed May 20, 2024
commit 5bfa1ff515e5faf2fea8688fbc281514c3342b66
90 changes: 90 additions & 0 deletions MultiTierDataMovement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Background Data Movement

In order to reduce the number of online evictions and support asynchronous
promotion - we have added two periodic workers to handle eviction and promotion.

The diagram below shows a simplified version of how the background evictor
thread (green) is integrated to the CacheLib architecture.

<p align="center">
<img width="640" height="360" alt="BackgroundEvictor" src="cachelib-background-evictor.png">
</p>

## Background Evictors

The background evictors scan each class to see if there are objects to move the next (lower)
tier using a given strategy. Here we document the parameters for the different
strategies and general parameters.

- `backgroundEvictorIntervalMilSec`: The interval that this thread runs for - by default
the background evictor threads will wake up every 10 ms to scan the AllocationClasses. Also,
the background evictor thread will be woken up everytime there is a failed allocation (from
a request handling thread) and the current percentage of free memory for the
AllocationClass is lower than `lowEvictionAcWatermark`. This may render the interval parameter
not as important when there are many allocations occuring from request handling threads.

- `evictorThreads`: The number of background evictors to run - each thread is a assigned
a set of AllocationClasses to scan and evict objects from. Currently, each thread gets
an equal number of classes to scan - but as object size distribution may be unequal - future
versions will attempt to balance the classes among threads. The range is 1 to number of AllocationClasses.
The default is 1.

- `maxEvictionBatch`: The number of objects to remove in a given eviction call. The
default is 40. Lower range is 10 and the upper range is 1000. Too low and we might not
remove objects at a reasonable rate, too high and it might increase contention with user threads.

- `minEvictionBatch`: Minimum number of items to evict at any time (if there are any
candidates)

- `maxEvictionPromotionHotness`: Maximum candidates to consider for eviction. This is similar to `maxEvictionBatch`
but it specifies how many candidates will be taken into consideration, not the actual number of items to evict.
This option can be used to configure duration of critical section on LRU lock.


### FreeThresholdStrategy (default)

- `lowEvictionAcWatermark`: Triggers background eviction thread to run
when this percentage of the AllocationClass is free.
The default is `2.0`, to avoid wasting capacity we don't set this above `10.0`.

- `highEvictionAcWatermark`: Stop the evictions from an AllocationClass when this
percentage of the AllocationClass is free. The default is `5.0`, to avoid wasting capacity we
don't set this above `10`.


## Background Promoters

The background promoters scan each class to see if there are objects to move to a lower
tier using a given strategy. Here we document the parameters for the different
strategies and general parameters.

- `backgroundPromoterIntervalMilSec`: The interval that this thread runs for - by default
the background promoter threads will wake up every 10 ms to scan the AllocationClasses for
objects to promote.

- `promoterThreads`: The number of background promoters to run - each thread is a assigned
a set of AllocationClasses to scan and promote objects from. Currently, each thread gets
an equal number of classes to scan - but as object size distribution may be unequal - future
versions will attempt to balance the classes among threads. The range is `1` to number of AllocationClasses. The default is `1`.

- `maxProtmotionBatch`: The number of objects to promote in a given promotion call. The
default is 40. Lower range is 10 and the upper range is 1000. Too low and we might not
remove objects at a reasonable rate, too high and it might increase contention with user threads.

- `minPromotionBatch`: Minimum number of items to promote at any time (if there are any
candidates)

- `numDuplicateElements`: This allows us to promote items that have existing handles (read-only) since
we won't need to modify the data when a user is done with the data. Therefore, for a short time
the data could reside in both tiers until it is evicted from its current tier. The default is to
not allow this (0). Setting the value to 100 will enable duplicate elements in tiers.

### Background Promotion Strategy (only one currently)

- `promotionAcWatermark`: Promote items if there is at least this
percent of free AllocationClasses. Promotion thread will attempt to move `maxPromotionBatch` number of objects
to that tier. The objects are chosen from the head of the LRU. The default is `4.0`.
This value should correlate with `lowEvictionAcWatermark`, `highEvictionAcWatermark`, `minAcAllocationWatermark`, `maxAcAllocationWatermark`.
- `maxPromotionBatch`: The number of objects to promote in batch during BG promotion. Analogous to
`maxEvictionBatch`. It's value should be lower to decrease contention on hot items.

115 changes: 81 additions & 34 deletions cachelib/allocator/BackgroundMover.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@

#include "cachelib/allocator/BackgroundMoverStrategy.h"
#include "cachelib/allocator/CacheStats.h"
#include "cachelib/common/AtomicCounter.h"
#include "cachelib/common/PeriodicWorker.h"

namespace facebook::cachelib {
Expand Down Expand Up @@ -51,6 +50,7 @@ enum class MoverDir { Evict = 0, Promote };
template <typename CacheT>
class BackgroundMover : public PeriodicWorker {
public:
using ClassBgStatsType = std::map<MemoryDescriptorType,uint64_t>;
using Cache = CacheT;
// @param cache the cache interface
// @param strategy the stragey class that defines how objects are
Expand All @@ -62,8 +62,9 @@ class BackgroundMover : public PeriodicWorker {
~BackgroundMover() override;

BackgroundMoverStats getStats() const noexcept;
std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
getClassStats() const noexcept;
ClassBgStatsType getClassStats() const noexcept {
return movesPerClass_;
}

void setAssignedMemory(std::vector<MemoryDescriptorType>&& assignedMemory);

Expand All @@ -72,8 +73,27 @@ class BackgroundMover : public PeriodicWorker {
static size_t workerId(TierId tid, PoolId pid, ClassId cid, size_t numWorkers);

private:
std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
movesPerClass_;
ClassBgStatsType movesPerClass_;

struct TraversalStats {
// record a traversal and its time taken
void recordTraversalTime(uint64_t nsTaken);

uint64_t getAvgTraversalTimeNs(uint64_t numTraversals) const;
uint64_t getMinTraversalTimeNs() const { return minTraversalTimeNs_; }
uint64_t getMaxTraversalTimeNs() const { return maxTraversalTimeNs_; }
uint64_t getLastTraversalTimeNs() const { return lastTraversalTimeNs_; }

private:
// time it took us the last time to traverse the cache.
uint64_t lastTraversalTimeNs_{0};
uint64_t minTraversalTimeNs_{
std::numeric_limits<uint64_t>::max()};
uint64_t maxTraversalTimeNs_{0};
uint64_t totalTraversalTimeNs_{0};
};

TraversalStats traversalStats_;
// cache allocator's interface for evicting
using Item = typename Cache::Item;

Expand All @@ -89,9 +109,10 @@ class BackgroundMover : public PeriodicWorker {
void work() override final;
void checkAndRun();

AtomicCounter numMovedItems_{0};
AtomicCounter numTraversals_{0};
AtomicCounter totalBytesMoved_{0};
uint64_t numMovedItems{0};
uint64_t numTraversals{0};
uint64_t totalClasses{0};
uint64_t totalBytesMoved{0};

std::vector<MemoryDescriptorType> assignedMemory_;
folly::DistributedMutex mutex_;
Expand All @@ -111,6 +132,20 @@ BackgroundMover<CacheT>::BackgroundMover(
}
}

template <typename CacheT>
void BackgroundMover<CacheT>::TraversalStats::recordTraversalTime(uint64_t nsTaken) {
lastTraversalTimeNs_ = nsTaken;
minTraversalTimeNs_ = std::min(minTraversalTimeNs_, nsTaken);
maxTraversalTimeNs_ = std::max(maxTraversalTimeNs_, nsTaken);
totalTraversalTimeNs_ += nsTaken;
}

template <typename CacheT>
uint64_t BackgroundMover<CacheT>::TraversalStats::getAvgTraversalTimeNs(
uint64_t numTraversals) const {
return numTraversals ? totalTraversalTimeNs_ / numTraversals : 0;
}

template <typename CacheT>
BackgroundMover<CacheT>::~BackgroundMover() {
stop(std::chrono::seconds(0));
Expand Down Expand Up @@ -144,44 +179,56 @@ template <typename CacheT>
void BackgroundMover<CacheT>::checkAndRun() {
auto assignedMemory = mutex_.lock_combine([this] { return assignedMemory_; });

unsigned int moves = 0;
auto batches = strategy_->calculateBatchSizes(cache_, assignedMemory);

for (size_t i = 0; i < batches.size(); i++) {
const auto [tid, pid, cid] = assignedMemory[i];
const auto batch = batches[i];
while (true) {
unsigned int moves = 0;
std::set<ClassId> classes{};
auto batches = strategy_->calculateBatchSizes(cache_, assignedMemory);

const auto begin = util::getCurrentTimeNs();
for (size_t i = 0; i < batches.size(); i++) {
const auto [tid, pid, cid] = assignedMemory[i];
const auto batch = batches[i];
if (!batch) {
continue;
}

// try moving BATCH items from the class in order to reach free target
auto moved = moverFunc(cache_, tid, pid, cid, batch);
moves += moved;
movesPerClass_[assignedMemory[i]] += moved;
}
auto end = util::getCurrentTimeNs();
if (moves > 0) {
traversalStats_.recordTraversalTime(end > begin ? end - begin : 0);
numMovedItems += moves;
numTraversals++;
}

if (batch == 0) {
continue;
//we didn't move any objects done with this run
if (moves == 0 || shouldStopWork()) {
break;
}
const auto& mpStats = cache_.getPoolByTid(pid, tid).getStats();
// try moving BATCH items from the class in order to reach free target
auto moved = moverFunc(cache_, tid, pid, cid, batch);
moves += moved;
movesPerClass_[tid][pid][cid] += moved;
totalBytesMoved_.add(moved * mpStats.acStats.at(cid).allocSize );
}

numTraversals_.inc();
numMovedItems_.add(moves);
}

template <typename CacheT>
BackgroundMoverStats BackgroundMover<CacheT>::getStats() const noexcept {
BackgroundMoverStats stats;
stats.numMovedItems = numMovedItems_.get();
stats.runCount = numTraversals_.get();
stats.totalBytesMoved = totalBytesMoved_.get();
stats.numMovedItems = numMovedItems;
stats.totalBytesMoved = totalBytesMoved;
stats.totalClasses = totalClasses;
auto runCount = getRunCount();
stats.runCount = runCount;
stats.numTraversals = numTraversals;
stats.avgItemsMoved = (double) stats.numMovedItems / (double)runCount;
stats.lastTraversalTimeNs = traversalStats_.getLastTraversalTimeNs();
stats.avgTraversalTimeNs = traversalStats_.getAvgTraversalTimeNs(numTraversals);
stats.minTraversalTimeNs = traversalStats_.getMinTraversalTimeNs();
stats.maxTraversalTimeNs = traversalStats_.getMaxTraversalTimeNs();

return stats;
}

template <typename CacheT>
std::map<TierId, std::map<PoolId, std::map<ClassId, uint64_t>>>
BackgroundMover<CacheT>::getClassStats() const noexcept {
return movesPerClass_;
}

template <typename CacheT>
size_t BackgroundMover<CacheT>::workerId(TierId tid,
PoolId pid,
Expand Down
37 changes: 29 additions & 8 deletions cachelib/allocator/BackgroundMoverStrategy.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,6 @@
namespace facebook {
namespace cachelib {

struct MemoryDescriptorType {
MemoryDescriptorType(TierId tid, PoolId pid, ClassId cid) :
tid_(tid), pid_(pid), cid_(cid) {}
TierId tid_;
PoolId pid_;
ClassId cid_;
};

// Base class for background eviction strategy.
class BackgroundMoverStrategy {
public:
Expand All @@ -46,5 +38,34 @@ class BackgroundMoverStrategy {
virtual ~BackgroundMoverStrategy() = default;
};

class DefaultBackgroundMoverStrategy : public BackgroundMoverStrategy {
public:
DefaultBackgroundMoverStrategy(uint64_t batchSize, double targetFree)
: batchSize_(batchSize), targetFree_((double)targetFree/100.0) {}
~DefaultBackgroundMoverStrategy() {}

std::vector<size_t> calculateBatchSizes(
const CacheBase& cache,
std::vector<MemoryDescriptorType> acVec) {
std::vector<size_t> batches{};
for (auto [tid, pid, cid] : acVec) {
double usage = cache.getPoolByTid(pid, tid).getApproxUsage(cid);
uint32_t perSlab = cache.getPoolByTid(pid, tid).getPerSlab(cid);
if (usage >= (1.0-targetFree_)) {
uint32_t batch = batchSize_ > perSlab ? perSlab : batchSize_;
batches.push_back(batch);
} else {
//no work to be done since there is already
//at least targetFree remaining in the class
batches.push_back(0);
}
}
return batches;
}
private:
uint64_t batchSize_{100};
double targetFree_{0.05};
};

} // namespace cachelib
} // namespace facebook
16 changes: 16 additions & 0 deletions cachelib/allocator/Cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,22 @@ enum class DestructorContext {
kRemovedFromNVM
};

struct MemoryDescriptorType {
MemoryDescriptorType(TierId tid, PoolId pid, ClassId cid) :
tid_(tid), pid_(pid), cid_(cid) {}
TierId tid_;
PoolId pid_;
ClassId cid_;

bool operator<(const MemoryDescriptorType& rhs) const {
return std::make_tuple(tid_, pid_, cid_) < std::make_tuple(rhs.tid_, rhs.pid_, rhs.cid_);
}

bool operator==(const MemoryDescriptorType& rhs) const {
return std::make_tuple(tid_, pid_, cid_) == std::make_tuple(rhs.tid_, rhs.pid_, rhs.cid_);
}
};

// A base class of cache exposing members and status agnostic of template type.
class CacheBase {
public:
Expand Down
Loading