Validator Interface Update & Converter Changes by ankitk-me · Pull Request #533 · aklivity/zilla

ankitk-me · 2023-10-25T05:59:15Z

For previous review comments refer: #499

* feature/m1-docker-build-support - separate jammy and alpine - add zilla version as env var - add the docker platform to properties - don't need to use alpine for build * docker image tagging options separate alpine base image from the default image and add more tagging options * set the version env var in the alpine build * remove the suffix for local build * make version tagging more explicit for each profile * move the alpine specific builds into the docker-image module * reduce the folder complexity and add child pom placeholders * revert the docker-image pom to develop * Use buildx for multi-arch images, build alpine image for release only * Move inline assembly to descriptor file and reference from alpine image --------- Co-authored-by: John Fallows <john.r.fallows@gmail.com>

…r the group stream (aklivity#502)

) Bumps alpine from 3.18.3 to 3.18.4. --- updated-dependencies: - dependency-name: alpine dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

jfallows · 2023-11-27T03:22:16Z

+        int schemaId;
+        int progress = 0;
+        if (data.getByte(index) == MAGIC_BYTE)
+        {
+            progress += BitUtil.SIZE_OF_BYTE;
+            schemaId = data.getInt(index + progress);
+            progress += BitUtil.SIZE_OF_INT;
+        }
+        else
+        {
+            schemaId = catalog != null ? catalog.id : 0;
+        }


Let's define an int constant NO_SCHEMA_ID with value 0.

Suggested change

int schemaId;

int progress = 0;

if (data.getByte(index) == MAGIC_BYTE)

{

progress += BitUtil.SIZE_OF_BYTE;

schemaId = data.getInt(index + progress);

progress += BitUtil.SIZE_OF_INT;

}

else

{

schemaId = catalog != null ? catalog.id : 0;

}

int schemaId = NO_SCHEMA_ID;

int progress = 0;

if (data.getByte(index) == MAGIC_BYTE)

{

progress += BitUtil.SIZE_OF_BYTE;

schemaId = data.getInt(index + progress);

progress += BitUtil.SIZE_OF_INT;

}

else if (catalog.id != NO_SCHEMA_ID)

{

schemaId = catalog.id;

}

jfallows · 2023-11-27T03:23:16Z

+        Schema schema = fetchSchema(schemaId);
+        if (schema != null)
+        {
+            if ("json".equals(format))


Let's define String constant FORMAT_JSON with value "json" and use it here.

jfallows · 2023-11-27T03:23:40Z

+        return valLength;
+    }
+
+    private byte[] deserializeAvroRecord(


Rename to deserializeRecord since we are already in the AvroReadValidator.

jfallows · 2023-11-27T03:26:20Z

+        int offset,
+        int length)
+    {
+        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();


Rename outputStream to encoded.
Can we pre-create this once in the constructor and call encoded.reset() here instead?

jfallows · 2023-11-27T03:31:00Z

+    protected final String format;
+    protected DatumReader<GenericRecord> reader;
+    protected DatumWriter<GenericRecord> writer;
+    protected final DirectBuffer valueRO = new UnsafeBuffer();


Please move up just under statics as this is effectively a constant.

jfallows · 2023-11-27T04:30:51Z

        this.supplyCache = supplyCache;
        this.supplyCacheRoute = supplyCacheRoute;
-        this.cursorFactory = new KafkaCacheCursorFactory(context.writeBuffer());
+        this.cursorFactory = new KafkaCacheCursorFactory(context.writeBuffer().capacity());


Please confirm this change is still needed after we added convertedFile in cache segment.

jfallows · 2023-11-27T04:31:05Z

+        int writeCapacity)
    {
-        this.writeBuffer = writeBuffer;
+        this.writeBuffer = new UnsafeBuffer(ByteBuffer.allocate(writeCapacity));


Please confirm this change is still needed after we added convertedFile in cache segment.

jfallows · 2023-11-27T04:31:21Z

        this.localIndex = context.index();
        this.cleanupDelay = config.cacheClientCleanupDelay();
-        this.cursorFactory = new KafkaCacheCursorFactory(context.writeBuffer());
+        this.cursorFactory = new KafkaCacheCursorFactory(context.writeBuffer().capacity());


Please confirm this change is still needed after we added convertedFile in cache segment.

jfallows · 2023-11-27T04:31:56Z

+        private ValueValidator validateKey;
+        private FragmentValidator validateValue;


jfallows · 2023-11-27T04:32:15Z

        this.supplyCache = supplyCache;
        this.supplyCacheRoute = supplyCacheRoute;
-        this.cursorFactory = new KafkaCacheCursorFactory(writeBuffer);
+        this.cursorFactory = new KafkaCacheCursorFactory(writeBuffer.capacity());


Please confirm this change is still needed after we added convertedFile in cache segment.

jfallows · 2023-12-14T21:35:30Z

        }
-        return status;
+
+        return padding;


Please move the private supporting methods to the end of the file, so that methods used by other classes are at the top.

jfallows · 2023-12-14T21:36:17Z

+    protected GenericDatumWriter<GenericRecord> supplyWriter(
+        int schemaId)
+    {
+        writer = null;


This should be locally defined instead of populating a member field as a side effect.

jfallows · 2023-12-14T21:36:34Z

+    protected GenericDatumReader<GenericRecord> supplyReader(
+        int schemaId)
+    {
+        reader = null;


This should be locally defined instead of populating a member field as a side effect.

jfallows · 2023-12-14T22:01:43Z

+    protected GenericDatumReader<GenericRecord> reader;
+    protected GenericDatumWriter<GenericRecord> writer;
+    protected GenericRecord record;


Let's try to remove these transient fields and use the return value from the corresponding supply methods instead.

jfallows · 2023-12-14T22:01:54Z

+    protected ByteArrayOutputStream encoded;
+    protected DirectBufferInputStream in;


Should these be final?

jfallows · 2023-12-15T00:13:41Z

+            String topic,
+            OctetsFW payload)
+        {
+            final ValueValidator contentValueValidator = supplyValidator.apply(topic);


Just validator.

jfallows · 2023-12-15T00:14:30Z

+            OctetsFW payload)
+        {
+            final ValueValidator contentValueValidator = supplyValidator.apply(topic);
+            return contentValueValidator == null ||


Should we introduce a ValueValidator.NOP to return from supplyValidator so we can avoid these null checks?

we already have ValueValidator no op, null check here is redundant. removed it.

jfallows · 2023-12-15T00:21:34Z

    int NO_SCHEMA_ID = 0;
+    String TEST = "test"; // Added for unit test & IT purpose
+    String SCHEMA_REGISTRY = "schema-registry";
+    String INLINE = "inline";


These cannot go here, caused by abstraction leak and this is a public API for engine.

jfallows · 2023-12-15T00:22:36Z

+        int padding = 0;
+        if (appendPrefix)
+        {
+            padding = MAX_PADDING_LEN; // TODO: fetch this from catalog


Let's defer this to the catalog handler, if present.

jfallows · 2023-12-15T00:28:28Z

+                                   .partition(0, 1, 2)
+                                   .build()
+                              .build()}
+write [0x00] [0x00 0x00 0x00 0x00 0x01] ${kafka:varint(3)} "id0" ${kafka:varint(8)} "positive"


I thought it was 1 byte (magic) followed by int32 (schemaId) then payload, no?
This has 6-byte prefix, not 5 bytes.

that's true, but I have added this script to mimic the scenario when the prefix can be more than five byte. So that the padding can be dynamic.

jfallows · 2023-12-19T01:48:18Z

+        {
+            GenericRecord record = supplyRecord(schemaId);
+            in.wrap(buffer, index, length);
+            GenericDatumReader<GenericRecord> reader = readers.computeIfAbsent(schemaId, this::supplyReader);


Suggested change

GenericDatumReader<GenericRecord> reader = readers.computeIfAbsent(schemaId, this::supplyReader);

GenericDatumReader<GenericRecord> reader = supplyReader(schemaId);

jfallows · 2023-12-19T02:01:04Z

+            Schema schema = supplySchema(schemaId);
+            GenericDatumReader<GenericRecord> reader = supplyReader(schemaId);
+            GenericDatumWriter<GenericRecord> writer = supplyWriter(schemaId);
+            GenericRecord record = supplyRecord(schemaId);


Can the json decoder below be precreated and reused instead of creating afresh each time?

this doesn't seems possible as we don't have a method to take reuse JsonDecoder.

but I was able to optimise BinaryEncoder & reuse the same instance.

jfallows · 2023-12-19T02:06:59Z

                    }
                }
-                bytesIndex += numBytes;
+                index += charByteCount;


We are calculating index + j more than once here and also j is not descriptive.

final int charByteLimit =. index + charByteCount; for (int charByteIndex = index + 1; charByteIndex < charByteLimit; charByteIndex++) { if (charByteIndex >= limit || (data.getByte(charByteIndex) & 0b11000000) != 0b10000000) { break validate; } }

I think the break above needs to break from the outer while loop, not just the for loop, so we'll need to add a validate: label to the while loop, agree?

jfallows · 2023-12-19T02:08:37Z

+    private JsonProvider supplyProvider(
+        int schemaId)
+    {
+        return providers.computeIfAbsent(schemaId, id -> createProvider(supplySchema(id)));


Suggest we simplify by making createProvider take schemaId and calling supplySchema from createProvider implementation.

jfallows · 2023-12-19T02:11:43Z

+
+    default int maxPadding()
+    {
+        return ZERO_PADDING;


We can return 0 here instead of a constant.

jfallows · 2023-12-19T02:20:19Z

+        {
+            prefixRO.putByte(0, MAGIC_BYTE);
+            prefixRO.putInt(1, schemaId, ByteOrder.BIG_ENDIAN);
+            next.accept(prefixRO, 0, 5);


Suggested change

next.accept(prefixRO, 0, 5);

next.accept(prefixRO, 0, PREFIX_LENGTH);

jfallows · 2023-12-19T02:21:02Z


 public class TestCatalogHandler implements CatalogHandler
 {
+    private static final int MAX_PADDING_LEN = 10;


Suggest renaming to MAX_PADDING_LENGTH for consistency of naming.

jfallows · 2023-12-19T02:34:45Z

+    @Override
+    public int enrich(
+        int schemaId,
+        ValueConsumer next)


What is the intention of the return value of enrich?

It seems like it is trying to say how much of the input should be ignored, while also potentially writing something to next.

So in the case of embed where we are adding the prefix bytes, the PREFIX_LENGTH would already be passed to next, so perhaps this case should return 0?

And when we are doing exclude, meaning we are stripping off the PREFIX_LENGTH bytes, we would not call next so we need to return PREFIX_LENGTH to indicate how many bytes we skipped over.

The implementation below is returning PREFIX_LENGTH for both cases, is it intentional?

jfallows · 2023-12-19T02:36:24Z

+        {
+            length = ENRICHED_LENGTH;
+        }
+        return length;


Does it make sense to have booleans for embed and exclude or is this more a funciton of whether the validator is read vs write?

If it is a property of the validator, then perhaps we need to pass this context from the validator to the catalog handler, either via a parameter or by having 2 methods on catalog handler?

jfallows · 2023-12-19T02:37:26Z

-    public String type()
-    {
-        return SCHEMA_REGISTRY;
+        this.prefixRO = new UnsafeBuffer(new byte[5]);


We should probably have this structure generated from an internal idl instead for better readability, much like we do for the kafka cache entry descriptions and other protocol codec flyweights.

jfallows · 2023-12-21T05:42:17Z

+            DirectBuffer data,
+            int payloadIndex,
+            int payloadLength);
+    }


Use simpler names for parameters, such as index length instead of payloadIndex payloadLength.

jfallows · 2023-12-21T05:55:52Z

    default int maxPadding()
    {
-        return ZERO_PADDING;
+        return 0;


Let's rename to encodePadding to indicate this is needed only for encode case.

Also let's rename Validator.maxPadding to Validator.padding since padding concept is already an upper bound in zilla.

jfallows · 2023-12-21T05:58:59Z

+        SchemaConfig catalog,
+        String subject)
+    {
+        int schemaId = 0;


int schemaId = NO_SCHEMA_ID;

jfallows · 2023-12-21T06:07:19Z

+        SchemaConfig catalog,
+        String subject,


Let's aim to remove these parameters from decode.

jfallows · 2023-12-21T06:31:46Z

-    int ZERO_PADDING = 0;
+
+    @FunctionalInterface
+    interface Read


Suggest renaming to Decoder, with IDENTITY constant that defaults to next.accept(data, index, length).

jfallows · 2023-12-21T06:53:02Z


-        GenericDatumReader<GenericRecord> reader = supplyReader(schemaId);
-        if (reader != null)
+    private int validatePayload(


Suggest renaming this to decodePayload

jfallows · 2023-12-21T06:57:02Z

+        int length,
+        ValueConsumer next,
+        int schemaId)
+    {


Receive schemaId as NO_SCHEMA_ID if not found in catalog prefix, so can default here using catalog and subject if needed.

jfallows · 2023-12-21T07:00:09Z

    protected final String subject;
    protected final String format;
-    protected final ByteArrayOutputStream encoded;
+    protected final ExpandableDirectBufferOutputStream encoded;


Suggest renaming this to expandable as it is used for both encode and decode.

jfallows · 2023-12-21T07:11:42Z

+            schemaId = resolve(subject, catalog.version);
+        }
+        return schemaId;
+    }


Let's see if we can simplify this to not require embedding magic byte and schema id in payload.

jfallows · 2023-12-21T07:15:44Z

+    }
+
+    @Test
+    public void shouldVerifyEnrichedData()


enriched -> encoded

ankitk-me and others added 21 commits September 28, 2023 11:38

Schema Config Update

41a8899

converter implementation

f26457c

Merge branch 'aklivity:develop' into convertor

fd69f8b

test coverage update

fdb9d98

schema update

dcecf18

Interface update to support Streaming validation

b12c609

refactoring

3621172

bug fix

8435d0e

IT fix

71451a7

0 for no mqtt session expiry should be mapped to max integer value fo…

c177894

…r the group stream (aklivity#502)

Better handle request with same group id (aklivity#498)

366daff

Prepare release 0.9.55

e362fe9

Update CHANGELOG.md

66680f9

Fix flow control bug in mqtt-kakfa publish (aklivity#524)

6829614

Add extraEnv to Deployment in the helm chart (aklivity#511)

bb9fb56

Sporadic github action build failure fix (aklivity#522)

a1235ce

Merge branch 'feature/schema-registry' into convertor

6eb8a43

pom fix

f7bf292

updating Varint32FW initialisation

fd59bbb

ankitk-me self-assigned this Oct 25, 2023

ankitk-me added 2 commits October 27, 2023 15:48

Fragment Validator Interface & Schema Update

4c264d7

String & Test Fragment Validator implementation

1e1e73d

jfallows requested changes Oct 31, 2023

View reviewed changes

ankitk-me added 2 commits November 1, 2023 16:26

Addressing review feedback

65f12f0

Addressing review comments

705f7d4

ankitk-me marked this pull request as ready for review November 6, 2023 07:38

ankitk-me added 2 commits November 6, 2023 23:02

avro validator.yaml update

02b3b7a

Schema patch issue fix

6191c15

ankitk-me added 2 commits November 17, 2023 20:39

checkstyle fix

ce59fb0

IT & implementation to support fetch message without Schema ID prefix

2d6a9e3

jfallows requested changes Nov 27, 2023

View reviewed changes

ankitk-me added 9 commits November 27, 2023 23:57

addressing review feedback

664cece

fetch message without Schema ID prefix implementation

b4b02c8

Avro & Json Read Validator fix

f4e90e9

fix checkstyle

8075684

ITs for convertor & updating Test Validator

4dec79f

dynamic message size after conversion implementation

11d3747

Merge branch 'feature/schema-registry' into convertor

7281c2f

updating latest changes with Value & Fragment Validator interface.

08cc929

Converter bug fix

3cf0185

jfallows requested changes Dec 15, 2023

View reviewed changes

Addressing review feedback

a8f9c08

jfallows requested changes Dec 19, 2023

View reviewed changes

ankitk-me added 4 commits December 19, 2023 12:00

addressing review comments

1d82d23

using ExpandableDirectByteBuffer with valid index & length

8fb1016

review feedback & adding functional interface to CatalogHandler

9f3e434

encoded bug fix: position reset to 0

c48b2d7

jfallows requested changes Dec 21, 2023

View reviewed changes

ankitk-me added 3 commits December 21, 2023 22:35

Addressing review comments

9b77963

Avro unit test fix

49f2963

return -1 and ignore prefix.sizeof() in case of validation failure

ee1e231

jfallows approved these changes Dec 22, 2023

View reviewed changes

jfallows merged commit 0ffe279 into aklivity:feature/schema-registry Dec 22, 2023

ankitk-me deleted the convertor branch December 27, 2023 06:41

This was linked to issues Dec 28, 2023

Support inbound message transformation from json to avro #313

Closed

Support outbound message transformation from protobuf to json #458

Closed

jfallows removed a link to an issue Dec 28, 2023

Support outbound message transformation from protobuf to json #458

Closed

jfallows linked an issue Dec 28, 2023 that may be closed by this pull request

Support outbound message transformation from avro to json #315

Closed

		private ValueValidator validateKey;
		private FragmentValidator validateValue;

		protected ByteArrayOutputStream encoded;
		protected DirectBufferInputStream in;

	GenericDatumReader<GenericRecord> reader = readers.computeIfAbsent(schemaId, this::supplyReader);
	GenericDatumReader<GenericRecord> reader = supplyReader(schemaId);

	next.accept(prefixRO, 0, 5);
	next.accept(prefixRO, 0, PREFIX_LENGTH);

Conversation

ankitk-me commented Oct 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ankitk-me commented Oct 25, 2023 •

edited

Loading