[close #736] Select reachable store #737

shiyuhang0 · 2023-04-10T12:09:23Z

What problem does this PR solve?

Issue Number: close #736

What is changed and how does it work?

Judge store status before use. Use the next replica when the current replica is unreachable.

Test

I have tested in TiSpark follower read with the first store killed, its success to request the next store.

codecov · 2023-04-10T12:19:42Z

Codecov Report

Patch coverage: 61.53% and project coverage change: -0.22 ⚠️

Comparison is base (91b1439) 38.18% compared to head (ed84147) 37.96%.

Additional details and impacted files

@@             Coverage Diff              @@
##             master     #737      +/-   ##
============================================
- Coverage     38.18%   37.96%   -0.22%     
  Complexity     1612     1612              
============================================
  Files           278      278              
  Lines         17482    17493      +11     
  Branches       1986     1989       +3     
============================================
- Hits           6675     6641      -34     
- Misses        10140    10191      +51     
+ Partials        667      661       -6

Impacted Files	Coverage Δ
...ain/java/org/tikv/common/region/RegionManager.java	`80.11% <50.00%> (-2.03%)`	⬇️
src/main/java/org/tikv/common/region/TiRegion.java	`77.50% <100.00%> (+0.57%)`	⬆️

... and 9 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

ti-srebot · 2023-04-25T08:33:34Z

@zhangyangyu, Thanks for your review. The bot only counts LGTMs from Reviewers and higher roles, but you're still welcome to leave your comments. get siglist: dial tcp 172.16.4.167:34000: connect: no route to host

Signed-off-by: shiyuhang <1136742008@qq.com>

ti-chi-bot · 2023-04-27T10:37:15Z

In response to a cherrypick label: cannot checkout 3.3: error checking out 3.3: exit status 1. output: error: pathspec '3.3' did not match any file(s) known to git

shiyuhang0 · 2023-04-27T10:37:41Z

/cherry-pick release-3.3

ti-chi-bot · 2023-04-27T10:38:17Z

@shiyuhang0: new pull request created to branch release-3.3: #742.

In response to this:

/cherry-pick release-3.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

close tikv#736 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

close tikv#736 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: shiyuhang <1136742008@qq.com>

Signed-off-by: shiyuhang <1136742008@qq.com>

pingyu · 2023-04-28T11:35:46Z

src/main/java/org/tikv/common/region/RegionManager.java

+      List<Peer> replicaList = region.getReplicaList();
+      for (int i = 0; i < replicaList.size(); i++) {
+        Peer peer = replicaList.get(i);
+        store = getStoreById(peer.getStoreId(), backOffer);


Is it possible to make replicas not balance ? As we always try the replica 0 at first.

How about start from getCurrentReplica(), then if the current replica is not available, get next one by getNextReplica and retry.

There are two problems if we getNextReplica.

code will trap into endless loop when there is no available store.

assume we set follower,leader, the replica list will begin with the followers then the leaders. If we getNextReplica, we may miss the recover follower and request the leader

Besides, the original code always select the first one. also has the issue of balance

code will trap into endless loop when there is no available store.

It is not difficult to be solved by loop no more than replicaList.size() times.

assume we set follower,leader, the replica list will begin with the followers then the leaders. If we getNextReplica, we may miss the recover follower and request the leader

Why we should not "miss the recover follower and request the leader" ? If we don't want to read on leader, we can set the replica selector as "followers only". Otherwise, if "leader and follower" is used, it should be OK to read on leader.

Besides, the original code always select the first one. also has the issue of balance

I also feel strange that why getNextReplica is unused, and except which replicaIdx is changed nowhere. Any idea @iosmanthus ?

The customer wants a "fallback" strategy. They want to read followers first to isolate dump traffic. But if all the followers are down, fallback to leaders as a cover.

So if "follower and leader" is specified, it means use followers first, and then leaders. The order matters.

Is stale read enabled ? If not, when all followers are down, the leader is also unavailable (neither readable nor writable as well, see https://www.pingcap.com/blog/lease-read/)

Ahh, I think we take some terms wrong, it's not follower, it's learner, who will not attend voting. And yes, this is for stale read. The normal TiKV nodes will still take leader and follower and handle online traffic. And there will be extra learner nodes dedicated for offline traffic. When learners are down, goes for the normal nodes.

shiyuhang0 force-pushed the select_store branch from 3851b65 to 217f308 Compare April 10, 2023 12:09

shiyuhang0 changed the title ~~[close #736]select reachable store~~ [close #736] Select reachable store Apr 10, 2023

shiyuhang0 force-pushed the select_store branch from c72ab64 to 6a64085 Compare April 10, 2023 12:28

zhangyangyu previously approved these changes Apr 25, 2023

View reviewed changes

shiyuhang0 dismissed zhangyangyu’s stale review via 96b8767 April 25, 2023 08:45

shiyuhang0 force-pushed the select_store branch from 02ed862 to 96b8767 Compare April 25, 2023 08:45

shiyuhang0 added 6 commits April 27, 2023 11:16

select reachable store

0d11880

Signed-off-by: shiyuhang <1136742008@qq.com>

fmt

cc80429

Signed-off-by: shiyuhang <1136742008@qq.com>

add log

e510553

Signed-off-by: shiyuhang <1136742008@qq.com>

optimize

ef52f0c

Signed-off-by: shiyuhang <1136742008@qq.com>

optimize log

3827e91

Signed-off-by: shiyuhang <1136742008@qq.com>

upadte comment

63c51d3

Signed-off-by: shiyuhang <1136742008@qq.com>

shiyuhang0 force-pushed the select_store branch from e359dd2 to 63c51d3 Compare April 27, 2023 03:16

Merge branch 'master' into select_store

ed84147

shiyuhang0 added the needs-cherry-pick-3.3 label Apr 27, 2023

zhangyangyu merged commit e8feb23 into tikv:master Apr 27, 2023

ti-chi-bot mentioned this pull request Apr 27, 2023

[close #736] Select reachable store (#737) #742

Merged

ti-chi-bot pushed a commit to ti-chi-bot/client-java that referenced this pull request Apr 27, 2023

This is an automated cherry-pick of tikv#737

9bc46ab

close tikv#736 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

shiyuhang0 added a commit to ti-chi-bot/client-java that referenced this pull request Apr 27, 2023

This is an automated cherry-pick of tikv#737

2046895

close tikv#736 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: shiyuhang <1136742008@qq.com>

zhangyangyu pushed a commit that referenced this pull request Apr 27, 2023

[close #736] Select reachable store (#737) (#742)

4463abb

Signed-off-by: shiyuhang <1136742008@qq.com>

pingyu reviewed Apr 28, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[close #736] Select reachable store #737

[close #736] Select reachable store #737

shiyuhang0 commented Apr 10, 2023 •

edited

Loading

codecov bot commented Apr 10, 2023 •

edited

Loading

ti-srebot commented Apr 25, 2023

ti-chi-bot commented Apr 27, 2023

shiyuhang0 commented Apr 27, 2023

ti-chi-bot commented Apr 27, 2023

pingyu Apr 28, 2023

shiyuhang0 Apr 28, 2023

shiyuhang0 Apr 28, 2023

pingyu Apr 28, 2023

zhangyangyu Apr 28, 2023

pingyu Apr 29, 2023

zhangyangyu Apr 29, 2023

[close #736] Select reachable store #737

[close #736] Select reachable store #737

Conversation

shiyuhang0 commented Apr 10, 2023 • edited Loading

What problem does this PR solve?

What is changed and how does it work?

Test

codecov bot commented Apr 10, 2023 • edited Loading

Codecov Report

ti-srebot commented Apr 25, 2023

ti-chi-bot commented Apr 27, 2023

shiyuhang0 commented Apr 27, 2023

ti-chi-bot commented Apr 27, 2023

pingyu Apr 28, 2023

Choose a reason for hiding this comment

shiyuhang0 Apr 28, 2023

Choose a reason for hiding this comment

shiyuhang0 Apr 28, 2023

Choose a reason for hiding this comment

pingyu Apr 28, 2023

Choose a reason for hiding this comment

zhangyangyu Apr 28, 2023

Choose a reason for hiding this comment

pingyu Apr 29, 2023

Choose a reason for hiding this comment

zhangyangyu Apr 29, 2023

Choose a reason for hiding this comment

shiyuhang0 commented Apr 10, 2023 •

edited

Loading

codecov bot commented Apr 10, 2023 •

edited

Loading