PinUntilErrorChannel doesn't switch on 429 #661

iamdanfox · 2020-04-17T11:39:50Z

Before this PR

In PDS-117063, a user of our internal atlas-replacement switched from c-j-r -> dialogue and saw server errors.

Looking at the pinuntilerror.nextNode metric, it seemed we switched channels 5 times during a supposedly transactional workflow. This meant that some requests landed on one node and others landed on a different node, which caused the second node to return a hard error.

cc @LucasIME and @jkozlowski

After this PR

==COMMIT_MSG==
PinUntilErrorChannel doesn't switch on 429, to unblock transactional workflows
==COMMIT_MSG==

Possible downsides?

changelog-app · 2020-04-17T11:39:56Z

Generate changelog in `changelog/@unreleased`

Type

Description

PinUntilErrorChannel doesn't switch on 429, to unblock transactional workflows

Check the box to generate changelog(s)

Generate changelog entry

iamdanfox · 2020-04-17T11:42:19Z

simulation/src/test/resources/report.md

                             live_reloading[UNLIMITED_ROUND_ROBIN].txt:	success=60.2%	client_mean=PT2.84698S     	server_cpu=PT1H58M37.45S  	client_received=2500/2500	server_resps=2500	codes={200=1504, 500=996}
          one_big_spike[CONCURRENCY_LIMITER_BLACKLIST_ROUND_ROBIN].txt:	success=79.0%	client_mean=PT1.478050977S 	server_cpu=PT1M59.71393673S	client_received=1000/1000	server_resps=790	codes={200=790, Failed to make a request=210}
-                one_big_spike[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt:	success=100.0%	client_mean=PT1.286733552S 	server_cpu=PT2M48.75S     	client_received=1000/1000	server_resps=1125	codes={200=1000}
+                one_big_spike[CONCURRENCY_LIMITER_PIN_UNTIL_ERROR].txt:	success=100.0%	client_mean=PT1.135007332S 	server_cpu=PT2M49.65S     	client_received=1000/1000	server_resps=1131	codes={200=1000}


As expected, the graph shows one big spike of requests (i.e. this exact use case) remains pinned to one uri. https://github.com/palantir/dialogue/blob/dfox/pin-until-error-fix/simulation/src/test/resources/report.md#one_big_spikeconcurrency_limiter_pin_until_error

We may want to update this simulation to respond 429 instead of 503

It's intended to be representative of this exact workflow, so it responds 429 above some threshold:

public void one_big_spike() { int capacity = 100; servers = servers( SimulationServer.builder() .serverName("node1") .simulation(simulation) .handler(h -> h.respond200UntilCapacity(429, capacity).responseTime(Duration.ofMillis(150))) .build(), SimulationServer.builder() .serverName("node2") .simulation(simulation) .handler(h -> h.respond200UntilCapacity(429, capacity).responseTime(Duration.ofMillis(150))) .build());

dialogue-core/src/main/java/com/palantir/dialogue/core/PinUntilErrorChannel.java

dialogue-core/src/test/java/com/palantir/dialogue/core/PinUntilErrorChannelTest.java

…dialogue into dfox/pin-until-error-fix

svc-autorelease · 2020-04-17T11:57:31Z

Released 1.23.1

iamdanfox added 2 commits April 17, 2020 12:31

PinUntilError does not switch on 429

ca8fb5f

Dedicated test

2871873

probot-autolabeler bot added the autorelease label Apr 17, 2020

policy-bot bot requested a review from fawind April 17, 2020 11:40

iamdanfox added 2 commits April 17, 2020 12:40

Mention PDS-117063

bba6d8c

Add generated changelog entries

b7648f0

iamdanfox requested review from carterkozak and ferozco and removed request for fawind April 17, 2020 11:40

iamdanfox commented Apr 17, 2020

View reviewed changes

iamdanfox added the merge when ready label Apr 17, 2020

carterkozak reviewed Apr 17, 2020

View reviewed changes

dialogue-core/src/main/java/com/palantir/dialogue/core/PinUntilErrorChannel.java Show resolved Hide resolved

ferozco reviewed Apr 17, 2020

View reviewed changes

dialogue-core/src/main/java/com/palantir/dialogue/core/PinUntilErrorChannel.java Outdated Show resolved Hide resolved

carterkozak reviewed Apr 17, 2020

View reviewed changes

dialogue-core/src/test/java/com/palantir/dialogue/core/PinUntilErrorChannelTest.java Outdated Show resolved Hide resolved

iamdanfox added 2 commits April 17, 2020 12:53

CR

e293e89

Merge branch 'dfox/pin-until-error-fix' of ssh://github.com/palantir/…

2e48d37

…dialogue into dfox/pin-until-error-fix

carterkozak approved these changes Apr 17, 2020

View reviewed changes

bulldozer-bot bot merged commit e6ec9b2 into develop Apr 17, 2020

bulldozer-bot bot deleted the dfox/pin-until-error-fix branch April 17, 2020 11:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PinUntilErrorChannel doesn't switch on 429 #661

PinUntilErrorChannel doesn't switch on 429 #661

Uh oh!

iamdanfox commented Apr 17, 2020

Uh oh!

changelog-app bot commented Apr 17, 2020 •

edited by iamdanfox

Loading

Uh oh!

iamdanfox Apr 17, 2020

Uh oh!

carterkozak Apr 17, 2020

Uh oh!

iamdanfox Apr 17, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

svc-autorelease commented Apr 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

PinUntilErrorChannel doesn't switch on 429 #661

PinUntilErrorChannel doesn't switch on 429 #661

Uh oh!

Conversation

iamdanfox commented Apr 17, 2020

Before this PR

After this PR

Possible downsides?

Uh oh!

changelog-app bot commented Apr 17, 2020 • edited by iamdanfox Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Generate changelog in changelog/@unreleased

Uh oh!

iamdanfox Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

carterkozak Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

iamdanfox Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

svc-autorelease commented Apr 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

changelog-app bot commented Apr 17, 2020 •

edited by iamdanfox

Loading

Generate changelog in `changelog/@unreleased`

iamdanfox Apr 17, 2020 •

edited

Loading