Timeout in get user with strong option makes subsequent weak get fail [JIRA: RCS-250]

Symptom : single slow node may cause user fetch failure.
- Getting CS user is in two steps, first step with PR=all,
  in which single slow riak node can cause timeout error at client.
- When timeout occurs in `riakc_pb_socket`, it disconnects TCP connection
  and goes into wait-and-retry loop.
- Then, CS user get 2nd phase with weak option, but it's likely that reconnect
  does not happen yet, fails with disconnected error.

If "slow" node is completely frozen (no action will come out from it),
after health check timeout, strong get fails by "insufficient vnodes"
and weak get should work well.  For this case, certain user can not
access Riak CS for finite time period, 60 sec by default.

---

Reproduction (or simulation)
- Create 4-node cluster (`{get_user_timeout, 3000}` in advanced.config may help)
- Memo dev2 pid
  
  ```
  DEV2=`ps aux | grep riak_ee | grep dev1 | grep beam.smp | awk '{print $2;}' `
  ```
- Freeze it: `kill -s SIGSTOP $DEV2` (keep your fingers crossed, if unfortunate, freeze another node :hear_no_evil:)
- Do any access,


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout in get user with strong option makes subsequent weak get fail [JIRA: RCS-250] #1201

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Timeout in get user with strong option makes subsequent weak get fail [JIRA: RCS-250] #1201

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions