Validator Interface Update & Converter Changes#533
Conversation
* feature/m1-docker-build-support - separate jammy and alpine - add zilla version as env var - add the docker platform to properties - don't need to use alpine for build * docker image tagging options separate alpine base image from the default image and add more tagging options * set the version env var in the alpine build * remove the suffix for local build * make version tagging more explicit for each profile * move the alpine specific builds into the docker-image module * reduce the folder complexity and add child pom placeholders * revert the docker-image pom to develop * Use buildx for multi-arch images, build alpine image for release only * Move inline assembly to descriptor file and reference from alpine image --------- Co-authored-by: John Fallows <john.r.fallows@gmail.com>
…r the group stream (aklivity#502)
) Bumps alpine from 3.18.3 to 3.18.4. --- updated-dependencies: - dependency-name: alpine dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
| int schemaId; | ||
| int progress = 0; | ||
| if (data.getByte(index) == MAGIC_BYTE) | ||
| { | ||
| progress += BitUtil.SIZE_OF_BYTE; | ||
| schemaId = data.getInt(index + progress); | ||
| progress += BitUtil.SIZE_OF_INT; | ||
| } | ||
| else | ||
| { | ||
| schemaId = catalog != null ? catalog.id : 0; | ||
| } |
There was a problem hiding this comment.
Let's define an int constant NO_SCHEMA_ID with value 0.
| int schemaId; | |
| int progress = 0; | |
| if (data.getByte(index) == MAGIC_BYTE) | |
| { | |
| progress += BitUtil.SIZE_OF_BYTE; | |
| schemaId = data.getInt(index + progress); | |
| progress += BitUtil.SIZE_OF_INT; | |
| } | |
| else | |
| { | |
| schemaId = catalog != null ? catalog.id : 0; | |
| } | |
| int schemaId = NO_SCHEMA_ID; | |
| int progress = 0; | |
| if (data.getByte(index) == MAGIC_BYTE) | |
| { | |
| progress += BitUtil.SIZE_OF_BYTE; | |
| schemaId = data.getInt(index + progress); | |
| progress += BitUtil.SIZE_OF_INT; | |
| } | |
| else if (catalog.id != NO_SCHEMA_ID) | |
| { | |
| schemaId = catalog.id; | |
| } |
| Schema schema = fetchSchema(schemaId); | ||
| if (schema != null) | ||
| { | ||
| if ("json".equals(format)) |
There was a problem hiding this comment.
Let's define String constant FORMAT_JSON with value "json" and use it here.
| return valLength; | ||
| } | ||
|
|
||
| private byte[] deserializeAvroRecord( |
There was a problem hiding this comment.
Rename to deserializeRecord since we are already in the AvroReadValidator.
| int offset, | ||
| int length) | ||
| { | ||
| ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); |
There was a problem hiding this comment.
Rename outputStream to encoded.
Can we pre-create this once in the constructor and call encoded.reset() here instead?
| protected final String format; | ||
| protected DatumReader<GenericRecord> reader; | ||
| protected DatumWriter<GenericRecord> writer; | ||
| protected final DirectBuffer valueRO = new UnsafeBuffer(); |
There was a problem hiding this comment.
Please move up just under statics as this is effectively a constant.
| this.supplyCache = supplyCache; | ||
| this.supplyCacheRoute = supplyCacheRoute; | ||
| this.cursorFactory = new KafkaCacheCursorFactory(context.writeBuffer()); | ||
| this.cursorFactory = new KafkaCacheCursorFactory(context.writeBuffer().capacity()); |
There was a problem hiding this comment.
Please confirm this change is still needed after we added convertedFile in cache segment.
| int writeCapacity) | ||
| { | ||
| this.writeBuffer = writeBuffer; | ||
| this.writeBuffer = new UnsafeBuffer(ByteBuffer.allocate(writeCapacity)); |
There was a problem hiding this comment.
Please confirm this change is still needed after we added convertedFile in cache segment.
| this.localIndex = context.index(); | ||
| this.cleanupDelay = config.cacheClientCleanupDelay(); | ||
| this.cursorFactory = new KafkaCacheCursorFactory(context.writeBuffer()); | ||
| this.cursorFactory = new KafkaCacheCursorFactory(context.writeBuffer().capacity()); |
There was a problem hiding this comment.
Please confirm this change is still needed after we added convertedFile in cache segment.
| private ValueValidator validateKey; | ||
| private FragmentValidator validateValue; |
| this.supplyCache = supplyCache; | ||
| this.supplyCacheRoute = supplyCacheRoute; | ||
| this.cursorFactory = new KafkaCacheCursorFactory(writeBuffer); | ||
| this.cursorFactory = new KafkaCacheCursorFactory(writeBuffer.capacity()); |
There was a problem hiding this comment.
Please confirm this change is still needed after we added convertedFile in cache segment.
| } | ||
| return status; | ||
|
|
||
| return padding; |
There was a problem hiding this comment.
Please move the private supporting methods to the end of the file, so that methods used by other classes are at the top.
| protected GenericDatumWriter<GenericRecord> supplyWriter( | ||
| int schemaId) | ||
| { | ||
| writer = null; |
There was a problem hiding this comment.
This should be locally defined instead of populating a member field as a side effect.
| protected GenericDatumReader<GenericRecord> supplyReader( | ||
| int schemaId) | ||
| { | ||
| reader = null; |
There was a problem hiding this comment.
This should be locally defined instead of populating a member field as a side effect.
| protected GenericDatumReader<GenericRecord> reader; | ||
| protected GenericDatumWriter<GenericRecord> writer; | ||
| protected GenericRecord record; |
There was a problem hiding this comment.
Let's try to remove these transient fields and use the return value from the corresponding supply methods instead.
| protected ByteArrayOutputStream encoded; | ||
| protected DirectBufferInputStream in; |
| String topic, | ||
| OctetsFW payload) | ||
| { | ||
| final ValueValidator contentValueValidator = supplyValidator.apply(topic); |
| OctetsFW payload) | ||
| { | ||
| final ValueValidator contentValueValidator = supplyValidator.apply(topic); | ||
| return contentValueValidator == null || |
There was a problem hiding this comment.
Should we introduce a ValueValidator.NOP to return from supplyValidator so we can avoid these null checks?
There was a problem hiding this comment.
we already have ValueValidator no op, null check here is redundant. removed it.
| int NO_SCHEMA_ID = 0; | ||
| String TEST = "test"; // Added for unit test & IT purpose | ||
| String SCHEMA_REGISTRY = "schema-registry"; | ||
| String INLINE = "inline"; |
There was a problem hiding this comment.
These cannot go here, caused by abstraction leak and this is a public API for engine.
| int padding = 0; | ||
| if (appendPrefix) | ||
| { | ||
| padding = MAX_PADDING_LEN; // TODO: fetch this from catalog |
There was a problem hiding this comment.
Let's defer this to the catalog handler, if present.
| .partition(0, 1, 2) | ||
| .build() | ||
| .build()} | ||
| write [0x00] [0x00 0x00 0x00 0x00 0x01] ${kafka:varint(3)} "id0" ${kafka:varint(8)} "positive" |
There was a problem hiding this comment.
I thought it was 1 byte (magic) followed by int32 (schemaId) then payload, no?
This has 6-byte prefix, not 5 bytes.
There was a problem hiding this comment.
that's true, but I have added this script to mimic the scenario when the prefix can be more than five byte. So that the padding can be dynamic.
| { | ||
| GenericRecord record = supplyRecord(schemaId); | ||
| in.wrap(buffer, index, length); | ||
| GenericDatumReader<GenericRecord> reader = readers.computeIfAbsent(schemaId, this::supplyReader); |
There was a problem hiding this comment.
| GenericDatumReader<GenericRecord> reader = readers.computeIfAbsent(schemaId, this::supplyReader); | |
| GenericDatumReader<GenericRecord> reader = supplyReader(schemaId); |
| Schema schema = supplySchema(schemaId); | ||
| GenericDatumReader<GenericRecord> reader = supplyReader(schemaId); | ||
| GenericDatumWriter<GenericRecord> writer = supplyWriter(schemaId); | ||
| GenericRecord record = supplyRecord(schemaId); |
There was a problem hiding this comment.
Can the json decoder below be precreated and reused instead of creating afresh each time?
There was a problem hiding this comment.
this doesn't seems possible as we don't have a method to take reuse JsonDecoder.
There was a problem hiding this comment.
but I was able to optimise BinaryEncoder & reuse the same instance.
| } | ||
| } | ||
| bytesIndex += numBytes; | ||
| index += charByteCount; |
There was a problem hiding this comment.
We are calculating index + j more than once here and also j is not descriptive.
final int charByteLimit =. index + charByteCount;
for (int charByteIndex = index + 1; charByteIndex < charByteLimit; charByteIndex++)
{
if (charByteIndex >= limit || (data.getByte(charByteIndex) & 0b11000000) != 0b10000000)
{
break validate;
}
}
I think the break above needs to break from the outer while loop, not just the for loop, so we'll need to add a validate: label to the while loop, agree?
| private JsonProvider supplyProvider( | ||
| int schemaId) | ||
| { | ||
| return providers.computeIfAbsent(schemaId, id -> createProvider(supplySchema(id))); |
There was a problem hiding this comment.
Suggest we simplify by making createProvider take schemaId and calling supplySchema from createProvider implementation.
|
|
||
| default int maxPadding() | ||
| { | ||
| return ZERO_PADDING; |
There was a problem hiding this comment.
We can return 0 here instead of a constant.
| { | ||
| prefixRO.putByte(0, MAGIC_BYTE); | ||
| prefixRO.putInt(1, schemaId, ByteOrder.BIG_ENDIAN); | ||
| next.accept(prefixRO, 0, 5); |
There was a problem hiding this comment.
| next.accept(prefixRO, 0, 5); | |
| next.accept(prefixRO, 0, PREFIX_LENGTH); |
|
|
||
| public class TestCatalogHandler implements CatalogHandler | ||
| { | ||
| private static final int MAX_PADDING_LEN = 10; |
There was a problem hiding this comment.
Suggest renaming to MAX_PADDING_LENGTH for consistency of naming.
| @Override | ||
| public int enrich( | ||
| int schemaId, | ||
| ValueConsumer next) |
There was a problem hiding this comment.
What is the intention of the return value of enrich?
It seems like it is trying to say how much of the input should be ignored, while also potentially writing something to next.
So in the case of embed where we are adding the prefix bytes, the PREFIX_LENGTH would already be passed to next, so perhaps this case should return 0?
And when we are doing exclude, meaning we are stripping off the PREFIX_LENGTH bytes, we would not call next so we need to return PREFIX_LENGTH to indicate how many bytes we skipped over.
The implementation below is returning PREFIX_LENGTH for both cases, is it intentional?
| { | ||
| length = ENRICHED_LENGTH; | ||
| } | ||
| return length; |
There was a problem hiding this comment.
Does it make sense to have booleans for embed and exclude or is this more a funciton of whether the validator is read vs write?
If it is a property of the validator, then perhaps we need to pass this context from the validator to the catalog handler, either via a parameter or by having 2 methods on catalog handler?
| public String type() | ||
| { | ||
| return SCHEMA_REGISTRY; | ||
| this.prefixRO = new UnsafeBuffer(new byte[5]); |
There was a problem hiding this comment.
We should probably have this structure generated from an internal idl instead for better readability, much like we do for the kafka cache entry descriptions and other protocol codec flyweights.
| DirectBuffer data, | ||
| int payloadIndex, | ||
| int payloadLength); | ||
| } |
There was a problem hiding this comment.
Use simpler names for parameters, such as index length instead of payloadIndex payloadLength.
| default int maxPadding() | ||
| { | ||
| return ZERO_PADDING; | ||
| return 0; |
There was a problem hiding this comment.
Let's rename to encodePadding to indicate this is needed only for encode case.
Also let's rename Validator.maxPadding to Validator.padding since padding concept is already an upper bound in zilla.
| SchemaConfig catalog, | ||
| String subject) | ||
| { | ||
| int schemaId = 0; |
There was a problem hiding this comment.
int schemaId = NO_SCHEMA_ID;
| SchemaConfig catalog, | ||
| String subject, |
There was a problem hiding this comment.
Let's aim to remove these parameters from decode.
| int ZERO_PADDING = 0; | ||
|
|
||
| @FunctionalInterface | ||
| interface Read |
There was a problem hiding this comment.
Suggest renaming to Decoder, with IDENTITY constant that defaults to next.accept(data, index, length).
|
|
||
| GenericDatumReader<GenericRecord> reader = supplyReader(schemaId); | ||
| if (reader != null) | ||
| private int validatePayload( |
There was a problem hiding this comment.
Suggest renaming this to decodePayload
| int length, | ||
| ValueConsumer next, | ||
| int schemaId) | ||
| { |
There was a problem hiding this comment.
Receive schemaId as NO_SCHEMA_ID if not found in catalog prefix, so can default here using catalog and subject if needed.
| protected final String subject; | ||
| protected final String format; | ||
| protected final ByteArrayOutputStream encoded; | ||
| protected final ExpandableDirectBufferOutputStream encoded; |
There was a problem hiding this comment.
Suggest renaming this to expandable as it is used for both encode and decode.
| schemaId = resolve(subject, catalog.version); | ||
| } | ||
| return schemaId; | ||
| } |
There was a problem hiding this comment.
Let's see if we can simplify this to not require embedding magic byte and schema id in payload.
| } | ||
|
|
||
| @Test | ||
| public void shouldVerifyEnrichedData() |
For previous review comments refer: #499