unify caching across workers to maximize cache hits#977
Conversation
| private final long maxAgeMillis; | ||
| private final KarapaceEventContext event; | ||
| private final long catalogId; | ||
| private final ConcurrentHashMap<Integer, CompletableFuture<String>> cache; |
There was a problem hiding this comment.
| private final ConcurrentHashMap<Integer, CompletableFuture<String>> cache; | |
| private final ConcurrentMap<Integer, CompletableFuture<String>> cachedSchemas; |
| EngineContext context, | ||
| long catalogId) | ||
| long catalogId, | ||
| ConcurrentHashMap<Integer, CompletableFuture<String>> cache) |
There was a problem hiding this comment.
| ConcurrentHashMap<Integer, CompletableFuture<String>> cache) | |
| ConcurrentMap<Integer, CompletableFuture<String>> cachedSchemas) |
| this.maxAgeMillis = config.maxAge.toMillis(); | ||
| this.event = new KarapaceEventContext(context); | ||
| this.catalogId = catalogId; | ||
| this.cache = cache; |
There was a problem hiding this comment.
| this.cache = cache; | |
| this.cachedSchemas = cachedSchemas; |
| CompletableFuture<String> future = cache.get(schemaId); | ||
| if (future == null) | ||
| { | ||
| schemas.put(schemaId, schema); | ||
| future = CompletableFuture.supplyAsync(() -> | ||
| { | ||
| String response = sendHttpRequest(MessageFormat.format(SCHEMA_PATH, schemaId)); | ||
| return response != null ? request.resolveSchemaResponse(response) : null; | ||
| }); | ||
| } | ||
| try | ||
| { | ||
| schema = future.get(); | ||
| if (schema != null) | ||
| { | ||
| cache.put(schemaId, future); | ||
| schemas.put(schemaId, schema); | ||
| } | ||
| } | ||
| catch (ExecutionException | InterruptedException e) | ||
| { | ||
| // TODO: log an event |
There was a problem hiding this comment.
There is still a race here, between cache.get(schemaId) and cache.put(schemaId, future).
To solve the race, we need to use one of the methods on ConcurrentMap that is not available on Map, such as:
CompletableFuture<> future = cache.get(schemaId);
if (future == null)
{
CompletableFuture<> newFuture = new CompletableFuture<String>();
future = cache.putIfAbsent(schemaId, newFuture);
if (future == null)
{
// send request and complete newFuture
future = newFuture;
}
}
assert future != null;
schema = future.get();
if (schema != null)
{
schemas.put(schemaId, schema);
}
There was a problem hiding this comment.
Note, we also need to prevent unnecessary concurrent requests to resolve the schemaIds, such as latest, same as for resolving the schemas themselves from schemaIds.
| { | ||
| schemaIds.put(checkSum, new CachedSchemaId(System.currentTimeMillis(), schemaId)); | ||
| } | ||
| else if (schemaIds.containsKey(checkSum)) |
There was a problem hiding this comment.
| else if (schemaIds.containsKey(checkSum)) | |
| else if (schemaIds.containsKey(schemaKey)) |
| { | ||
| schemaId = schemaIds.get(checkSum).id; | ||
| } |
There was a problem hiding this comment.
Is this needed to handle the case where remote access fails, but we can serve from (stale) cache instead?
| "\"name\":\"Event\",\"namespace\":\"io.aklivity.example\",\"type\":\"record\"}"; | ||
|
|
||
| KarapaceCatalogHandler catalog = new KarapaceCatalogHandler(config, context, 0L); | ||
| KarapaceCatalogHandler catalog = new KarapaceCatalogHandler(config, context, 0L, new ConcurrentHashMap<>()); |
There was a problem hiding this comment.
Suggest adding a convenience constructor that calls this(config, context, catalogId, new ConcurrentHashMap<>()) instead of requiring the test to handle the complexity.
| newFuture = CompletableFuture.supplyAsync(() -> | ||
| { | ||
| String response = sendHttpRequest(MessageFormat.format(SCHEMA_PATH, schemaId)); | ||
| return response != null ? request.resolveSchemaResponse(response) : null; | ||
| }); | ||
| future = newFuture; |
There was a problem hiding this comment.
I don't think we can reassign newFuture here, as the value being lost is the one that already made it into the concurrent map, so if we reassign it then the one in the map would not be completed, agree?
| newFuture = CompletableFuture.supplyAsync(() -> | ||
| { | ||
| String response = sendHttpRequest(MessageFormat.format(SUBJECT_VERSION_PATH, subject, version)); | ||
| return new CachedSchemaId(System.currentTimeMillis(), | ||
| response != null ? request.resolveResponse(response) : NO_SCHEMA_ID); | ||
| }); | ||
| future = newFuture; |
There was a problem hiding this comment.
Same issue here as above, agree?
| CompletableFuture<CachedSchemaId> newFuture = | ||
| cachedSchemaIds.computeIfPresent(schemaKey, (key, existingFuture) -> | ||
| { | ||
| if (existingFuture.isDone() || existingFuture.isCompletedExceptionally()) | ||
| { | ||
| CompletableFuture<CachedSchemaId> id = CompletableFuture.supplyAsync(() -> | ||
| { | ||
| String response = sendHttpRequest(MessageFormat.format(SUBJECT_VERSION_PATH, | ||
| subject, version)); | ||
| return new CachedSchemaId(System.currentTimeMillis(), | ||
| response != null ? request.resolveResponse(response) : NO_SCHEMA_ID); | ||
| }); | ||
| return id; | ||
| } | ||
| else | ||
| { | ||
| return existingFuture; | ||
| } | ||
| }); |
There was a problem hiding this comment.
I think we can perhaps simplify the logic by creating the candidate newFuture and calling:
CompletableFuture<> newFuture = new CompletableFuture<CachedSchemaId>();
CompletableFuture<> future = cachedSchemaIds.merge(schemaKey, newFuture, (v1, v2) -> v1 != null && !v1.isDone() ? v1 : v2);
if (future == newFuture)
{
// send request
// complete future
}
cachedSchemaId = future.get();
| { | ||
| private final EngineContext context; | ||
| private final ConcurrentHashMap<Integer, CompletableFuture<String>> cache; | ||
| private final ConcurrentMap<Long, KarapaceCache> cache; |
There was a problem hiding this comment.
| private final ConcurrentMap<Long, KarapaceCache> cache; | |
| private final ConcurrentMap<Long, KarapaceCache> cachesById; |
| KarapaceCache karapaceCache = cache.computeIfAbsent(catalog.id, id -> new KarapaceCache()); | ||
| return new KarapaceCatalogHandler(KarapaceOptionsConfig.class.cast(catalog.options), context, catalog.id, karapaceCache); |
There was a problem hiding this comment.
| KarapaceCache karapaceCache = cache.computeIfAbsent(catalog.id, id -> new KarapaceCache()); | |
| return new KarapaceCatalogHandler(KarapaceOptionsConfig.class.cast(catalog.options), context, catalog.id, karapaceCache); | |
| KarapaceCache cache = cachesById.computeIfAbsent(catalog.id, id -> new KarapaceCache()); | |
| return new KarapaceCatalogHandler(KarapaceOptionsConfig.class.cast(catalog.options), context, catalog.id, cache); |
| public final ConcurrentMap<Integer, CompletableFuture<String>> cachedSchemas; | ||
| public final ConcurrentMap<Integer, CompletableFuture<CachedSchemaId>> cachedSchemaIds; |
There was a problem hiding this comment.
| public final ConcurrentMap<Integer, CompletableFuture<String>> cachedSchemas; | |
| public final ConcurrentMap<Integer, CompletableFuture<CachedSchemaId>> cachedSchemaIds; | |
| public final ConcurrentMap<Integer, CompletableFuture<String>> schemas; | |
| public final ConcurrentMap<Integer, CompletableFuture<CachedSchemaId>> schemaIds; |
Probably don't need the cached prefix now that these are fields of KarapaceCache.
| { | ||
| this.karapaceTypeId = context.supplyTypeId(KarapaceCatalog.NAME); | ||
| this.remoteAccessRejectedEventId = context.supplyEventId("catalog.karapace.remote.access.rejected"); | ||
| this.futureCompletedExceptionallyId = context.supplyEventId("catalog.karapace.future.completed.exceptionally"); |
There was a problem hiding this comment.
The concept of a future is internal implementation detail. Let's change this to still make sense from an external perspective for someone observing the logs with no understanding of the implementation internals.
| result = String.format(FUTURE_COMPLETED_EXCEPTIONALLY, asString(ex.error())); | ||
| break; | ||
| } | ||
| case STALE_SCHEMA_SERVED: |
There was a problem hiding this comment.
How many times is this going to be logged while Karapace remains unreachable?
Perhaps we need to reconsider what events would convey the information necessary without overwhelming the logs when things are not operating perfectly.
The state transition that has occurred is that Karapace has been unreachable for long enough that we are forced to serve stale schemas, and then the next state transition would be that we had reestablished connectivity to Karapace so that stale schemas are no longer being served. Do we think these transition log events are sufficient or do we need something more verbose?
| REMOTE_ACCESS_REJECTED (1), | ||
| FUTURE_COMPLETED_EXCEPTIONALLY (2), | ||
| STALE_SCHEMA_SERVED (3) |
There was a problem hiding this comment.
How do these differ in the information they convey to the log observer?
| public String schema; | ||
| public AtomicInteger event; | ||
|
|
There was a problem hiding this comment.
If these are both unmodifiable, then they should be marked final too.
| public class CachedSchema | ||
| { | ||
| public static final String SCHEMA_PLACEHOLDER = "schema"; | ||
| public static final CachedSchema IN_PROGRESS = new CachedSchema(SCHEMA_PLACEHOLDER, new AtomicInteger(0)); |
There was a problem hiding this comment.
No need to explicitly pass 0 to AtomicInteger constructor.
| public CachedSchema( | ||
| String schema, | ||
| AtomicInteger event) | ||
| { | ||
| this.schema = schema; | ||
| this.event = event; | ||
| } |
There was a problem hiding this comment.
Does it make sense to have a convenience constructor with just schema defauting to new AtomicInteger().
| public static final CachedSchema IN_PROGRESS = new CachedSchema(SCHEMA_PLACEHOLDER, new AtomicInteger(0)); | ||
|
|
||
| public String schema; | ||
| public AtomicInteger event; |
There was a problem hiding this comment.
Not sure why this is called event.
Isn't this just tracking the number of retries, which is normally 0 when everything is working normally?
| public long timestamp; | ||
| public int id; | ||
| public AtomicInteger event; | ||
| public long retry; |
There was a problem hiding this comment.
Should some or all of these be final?
| } | ||
|
|
||
| public void remoteAccessRejected( | ||
| public void unretrievableSchemaSubjectVersion( |
There was a problem hiding this comment.
I think it might be more readable if these methods were named onXxx instead of xxx.
So in the calling context it would be something like events.onUnretrievableSchemaSubjectVersion(...) (not event...), wdyt?
| public static final long RESET_RETRY_DELAY_MS_DEFAULT = 0L; | ||
| public static final long RETRY_INITIAL_DELAY_MS_DEFAULT = 1000L; | ||
| public static final int RETRY_MULTIPLER_DEFAULT = 2; |
| long current = cachedSchemaId.retry; | ||
| long update = current * RETRY_MULTIPLER_DEFAULT; | ||
| retry = update < maxAgeMillis ? update : current; |
There was a problem hiding this comment.
This logic is trying to prevent the value from increasing if it would end up larger than maxAgeMillis, but that means maxAgeMillis is okay too.
Suggest it is easier to follow if you change to max out at maxAgeMillis instead of current.
| long current = cachedSchemaId.retry; | |
| long update = current * RETRY_MULTIPLER_DEFAULT; | |
| retry = update < maxAgeMillis ? update : current; | |
| retryAfter = Math.min(cachedSchemaId.retryAfter << 1, maxAgeMillis); |
| EngineContext context = mock(EngineContext.class); | ||
| when(context.clock()).thenReturn(Clock.systemUTC()); | ||
| when(context.supplyEventWriter()).thenReturn(mock(MessageConsumer.class)); | ||
| KarapaceEventContext event = new KarapaceEventContext(context); |
There was a problem hiding this comment.
Suggest we call this events instead of event.
| if (schemaId != NO_SCHEMA_ID) | ||
| { | ||
| schemaIds.put(schemaKey, cachedSchemaId); | ||
| if (cachedSchemaId.retryAttempts.compareAndSet(1, 2)) |
There was a problem hiding this comment.
This doesn't look right to me, because the value of retryAttempts should be independent of serving stale schema, right?
| boolean success = httpResponse.statusCode() == 200; | ||
| String responseBody = success ? httpResponse.body() : null; | ||
| if (!success) | ||
| { | ||
| event.remoteAccessRejected(catalogId, httpRequest, httpResponse.statusCode()); | ||
| } | ||
| return responseBody; | ||
| responseBody = success ? httpResponse.body() : null; |
There was a problem hiding this comment.
We don't really need success any more, right?
responseBody = httpResponse.statusCode() == 200 ? httpResponse.body() : null;
|
|
||
| @Test | ||
| public void shouldLogRemoteRegistryEvents() | ||
| { | ||
| EngineContext context = mock(EngineContext.class); | ||
| when(context.clock()).thenReturn(Clock.systemUTC()); | ||
| when(context.supplyEventWriter()).thenReturn(mock(MessageConsumer.class)); | ||
| KarapaceEventContext events = new KarapaceEventContext(context); | ||
| events.onUnretrievableSchemaSubjectVersion(0L, "subject", "version"); | ||
| events.onRetrievableSchemaSubjectVersion(0L, "subject", "version"); | ||
| events.onUnretrievableSchemaSubjectVersionStaleSchema(0L, "subject", "version", 1); | ||
| events.onUnretrievableSchemaId(0L, 1); | ||
| events.onRetrievableSchemaId(0L, 1); | ||
| } | ||
|
|
There was a problem hiding this comment.
This should be done as an IT with scripts verifying each different possible behavior of the remote Karapace system.
Is anything missing from test binding that would let the karapace catalog be tested that way?
Note: please see filesystem catalog EventIT.
There was a problem hiding this comment.
Prefer this IT done via scripts not mocks.
Note: if using mocks then it is a unit test, not an integration test - unit tests with mocks are not stable after a refactor of the code as they also need to be changed, whereas integration tests remain stable, providing better confidence after changing implementation.
There was a problem hiding this comment.
There should be no need to drive any direct calls to KarapaceCatalogHandler via code and mocks.
Instead, this should be driven by test binding in zilla.yaml for the test. It may be necessary to add support for expected schema in test binding options so that it can verify the retrieved schema matches expectations.
Note: this would be for test binding only, as other bindings have no need to validate expectations.
There was a problem hiding this comment.
There should be no need to drive any direct calls to KarapaceCatalogHandler via code and mocks.
Instead, this should be driven by test binding in zilla.yaml for the test. It may be necessary to add support for expected schema in test binding options so that it can verify the retrieved schema matches expectations.
Note: this would be for test binding only, as other bindings have no need to validate expectations.
| else | ||
| { | ||
| handler.resolve(catalog.id); | ||
| } |
There was a problem hiding this comment.
This is excellent. As mentioned above, we may need to add support for verifying the expected schema here, then reject the initial begin with a reset if the schema doesn't match expectations, as that will cause the IT to fail.
The expected schemas may be a separate test binding option, independent of cataloged so as not to interfere with that generic definition used by other bindings that don't need to validate expectations.
| - catalog0 | ||
| catalog: | ||
| catalog0: | ||
| - id: 1 |
There was a problem hiding this comment.
It would seem more readable and descriptive of the intent to put subject: not-subject1 instead of id: 1 here, agree?
| catalog-assertion: | ||
| - schema: null | ||
| - schema: '{"fields":[{"name":"id","type":"string"},{"name":"status","type":"string"}],"name":"Event","namespace":"io.aklivity.example","type":"record"}' |
There was a problem hiding this comment.
This looks great!
Suggest we generalize the concept of assertions as we'll potentially need that later for vault, etc.
| catalog-assertion: | |
| - schema: null | |
| - schema: '{"fields":[{"name":"id","type":"string"},{"name":"status","type":"string"}],"name":"Event","namespace":"io.aklivity.example","type":"record"}' | |
| assertions: | |
| catalog: | |
| catalog0: | |
| - schema: null | |
| - schema: '{"fields":[{"name":"id","type":"string"},{"name":"status","type":"string"}],"name":"Event","namespace":"io.aklivity.example","type":"record"}' |
| { | ||
| "type": ["string", "null"] | ||
| }, | ||
| "interval": |
There was a problem hiding this comment.
Perhaps delay instead of interval since it is not repeating.
| catalog0: | ||
| - id: 9 | ||
| catalog-assertion: | ||
| - schema: '{"fields":[{"name":"id","type":"string"},{"name":"status","type":"string"}],"name":"Event","namespace":"io.aklivity.example","type":"record"}' |
There was a problem hiding this comment.
Is ti possible to use yaml syntax magic to let this be pretty printed json in the file, but compact format when parsed?
There was a problem hiding this comment.
I wasn't able to find a way to implement this feedback, tried using > (Folded Block Scalar) instead of |. But the result was still varying from the expected Schema string.
Fixes #937