Skip to content

Conversation

@bom-d-van
Copy link
Member

@bom-d-van bom-d-van commented May 12, 2019

Early works, more stuff to come.

This PR aims at introducing a new aggregation policy: mix. With mix policy, you could aggregated data from base/first archive into multiple lower archives with the same precision.

This means that you could have more than one aggregation policy comparing to the classic whisper files.

Take the policy bellow as an example:

retention: 1s:2d,1m:28d,1h:2y
aggregation: sum,last,min,max,average,p50,p90,p95

To use it in go-carbon, append the policy intended at the end of metric:

server.instance01.threads.busy@last
server.instance01.threads.busy@p95
server.*.threads.busy@p95

Drawback:

  • Data are only aggregated when there are enough for the last retention policy (take the example above, it takes an hour of data points to propagate for both 1m and 1h archives)
  • ?

Merits:

  • No longer need to choose policy for your metrics, you could include them all
  • File size is bigger
  • Along with mix, percentile is also supported now

Differences:

  • All data are aggregated from the base archive

Currently only compressed format is supported, but it could be extended to standard format.

@bom-d-van bom-d-van changed the base branch from master to backfill-spt May 12, 2019 20:50
@deniszh
Copy link
Member

deniszh commented May 13, 2019

Very cool idea, like it!
Will it work with normal whisper format, or only with compressed one, @bom-d-van ?

@azhiltsov
Copy link
Member

@deniszh uncompressed format will be very expensive to keep around with all aggregations enabled, but compressed shouldn't be that much a blow and makes a perfect sense.

@Civil
Copy link
Member

Civil commented May 13, 2019

@deniszh I think @bom-d-van already answered your question:
[quote]Currently only compressed format is supported, [b]but it could be extended to standard format.[b][/quote]

However the question is if it's feasible to do that? I would also assume that the change is not backward compatible and will require some conversion, at least upgrading metadata to follow the same format as compressed one, right? (sorry, I haven't read the code yet)

@deniszh
Copy link
Member

deniszh commented May 13, 2019

Nah, I'm totally fine with that, just overlooked in the initial description.

@bom-d-van
Copy link
Member Author

However the question is if it's feasible to do that? I would also assume that the change is not backward compatible and will require some conversion, at least upgrading metadata to follow the same format as compressed one, right? (sorry, I haven't read the code yet)

Off the top of my head, I think we could use the "mix" aggregation policy as a flag to differentiate new and old whisper files, but I could be wrong. However, conversion shouldn't be necessary, because unlike compressed format, it isn't attempting to convert historical data to mix policy.

Nice to see welcoming opinions of this feature, the original idea of including all aggregation policies comes from @azhiltsov and Anna, I was only thinking of percentiles. :P

@Civil
Copy link
Member

Civil commented May 13, 2019

If no conversion is required, then it need to be backward-compatible with Python whisper library (at least) and ideally with old go version. And I'm not sure how they will behave if it will spot extra archives.

@bom-d-van
Copy link
Member Author

then it need to be backward-compatible with Python whisper library (at least) and ideally with old go version

Yes, it is.

All changes made in go-whisper so far maintain backward compatibility. Changes could also be ported back to python library as well. I think these new designs/changes are beneficial in general. :)

* move most data structres in the closer places for readability.
* bug fixes for fetch, propagation, and backfilling for mix aggregation.
* fix tests for mix aggregation.
* when fetching from lower archives, if the time range overlapped data in base/1st
  archive, cwhipser would do a dynamic filling (for both mix and classic aggregations).
  this would slow some reads but for good reason.
* add cmd/read.go.
* cmd/dump.go: print date times along side with timestamps.
@bom-d-van bom-d-van changed the base branch from backfill-spt to master December 8, 2019 10:44
@piotr1212
Copy link

@deniszh uncompressed format will be very expensive to keep around with all aggregations enabled, but compressed shouldn't be that much a blow and makes a perfect sense.

@azhiltsov I've had people send the same metric multiple times with different postfixes so that they could have multiple aggregations. In this case it would actually be cheaper to have one metric with multiple aggregations.

That said, I don't know if there are any reasons to not switch to compressed whisper if you need many aggs.

…optimization

* bug fix for no read backfilling when fetching lower archive data from new files
@bom-d-van bom-d-van marked this pull request as ready for review December 30, 2019 12:05
@bom-d-van
Copy link
Member Author

Hi all, it takes a long time to complete but finally I feel the changes are ready to be reviewed. Hope it could be both a good ending for 2019 and a good opening for 2020. :D

As usual, it is designed and built for full backward compatibility and things should work as usual.

Copy link
Member

@Civil Civil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I've left 2 minor comments. Not really necessary to address any of them before merging this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants