Skip to content

Conversation

@rkhachatryan
Copy link
Contributor

@rkhachatryan rkhachatryan commented Aug 6, 2025

@rkhachatryan rkhachatryan requested a review from pnowojski August 6, 2025 14:29
Comment on lines 141 to 142
@Param("100")
public int payloadSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For micro benchmark purposes, like optimising code, guarding against regressions, the lower the payload, the better, as you get more contrasting effects of any performance related changes - optimisations/changes to the related code will be more clearly visible instead of being watered down by bytes copy.

So I would shrink it down, maybe to 10 or 20.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both cases can be problematic; I'll change to a smaller value and think about keeping 100 (or increasing it).

Copy link
Contributor Author

@rkhachatryan rkhachatryan Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to

        @Param({"10", "250"})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. it now takes 2+ hours for the whole benchmark to run locally.
I'm going to change to a single value of 20.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reduce the number of records per invocation? Or pick different number of records depending on the benchmark case? Slower benchmarks probably don't need as much records per invocation as faster cases.

Copy link
Contributor Author

@rkhachatryan rkhachatryan Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes long due to the number of combinations + benchmark setup, not due to the number of records:

  • Warmup: 10 iterations, 10 s each
  • Measurement: 10 iterations, 10 s each
  • forks: 3

So it's 3 * 2 * 10 * 10 = 10m for one combination.

And here are benchmark parameters:

  • upsert key: true / false
  • state backend: rocksdb / heap
  • retract delay: 1 / 1000
  • payload size, retract percentage - fixed, so don't increase

meaning there 2^3=8 combinations in total.

So the total time is 8*10=80 minutes.


I don't see what combinations can we exclude, all seem important.

OTH, I think benchmark setup is probably redundant for this particular benchmark. I haven't notice significant difference when running with lower duration/forkCount/measurementCount. So I'd rather reduce those
For example to

  • Warmup: 1 iterations, 5 s each
  • Measurement: 5 iterations, 5 s each
  • forks: 2

So it would be 2 * (5 + 5*5) * 8 = 8 minutes.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't touch the fork counts.

The recommended target for a single invocation is < 1s. Within a single iteration JMH works by invoking the benchmark method repeatedly for ~1s. If single invocation takes more than 1s, than iteration ends after the single invocation, which is usually not desired needed.

Given that here we are not spawning whole Flink job, on a mini cluster, I'm pretty sure we could reduce the invocation duration.

I guess maybe part of the problem is that your single invocation is also responsible for building up the state size, which arguably should/could be moved out to a setup method? You could:

  • build up the desired state size/length of the values list in the setup method
  • in invocation just run single process record call, or maybe single insertion and single retraction, after which the state size/list length remains unchanged?

That should be not only fine (as there would be no setup code in the benchmarked method), but that would be even by the book the intended way how to use JMH.

Each iteration would then take exactly 1s, so 1 minute per parameters combination, 8 minutes in total.

@rkhachatryan rkhachatryan marked this pull request as ready for review August 7, 2025 07:38
@rkhachatryan rkhachatryan requested a review from pnowojski August 7, 2025 07:38
Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM % opened comments

@rkhachatryan rkhachatryan requested a review from pnowojski August 7, 2025 09:50
@rkhachatryan
Copy link
Contributor Author

@flinkbot run azure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants