-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[close #736] Select reachable store #737
Conversation
3851b65
to
217f308
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #737 +/- ##
============================================
- Coverage 38.18% 37.96% -0.22%
Complexity 1612 1612
============================================
Files 278 278
Lines 17482 17493 +11
Branches 1986 1989 +3
============================================
- Hits 6675 6641 -34
- Misses 10140 10191 +51
+ Partials 667 661 -6
☔ View full report in Codecov by Sentry. |
c72ab64
to
6a64085
Compare
@zhangyangyu, Thanks for your review. The bot only counts LGTMs from Reviewers and higher roles, but you're still welcome to leave your comments. get siglist: dial tcp 172.16.4.167:34000: connect: no route to host |
02ed862
to
96b8767
Compare
Signed-off-by: shiyuhang <1136742008@qq.com>
Signed-off-by: shiyuhang <1136742008@qq.com>
Signed-off-by: shiyuhang <1136742008@qq.com>
e359dd2
to
63c51d3
Compare
In response to a cherrypick label: cannot checkout |
/cherry-pick release-3.3 |
@shiyuhang0: new pull request created to branch In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
close tikv#736 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
close tikv#736 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: shiyuhang <1136742008@qq.com>
List<Peer> replicaList = region.getReplicaList(); | ||
for (int i = 0; i < replicaList.size(); i++) { | ||
Peer peer = replicaList.get(i); | ||
store = getStoreById(peer.getStoreId(), backOffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make replicas not balance ? As we always try the replica 0 at first.
How about start from getCurrentReplica()
, then if the current replica is not available, get next one by getNextReplica
and retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two problems if we getNextReplica.
- code will trap into endless loop when there is no available store.
- assume we set
follower,leader
, the replica list will begin with the followers then the leaders. If we getNextReplica, we may miss the recover follower and request the leader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, the original code always select the first one. also has the issue of balance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- code will trap into endless loop when there is no available store.
It is not difficult to be solved by loop no more than replicaList.size()
times.
- assume we set
follower,leader
, the replica list will begin with the followers then the leaders. If we getNextReplica, we may miss the recover follower and request the leader
Why we should not "miss the recover follower and request the leader" ? If we don't want to read on leader, we can set the replica selector as "followers only". Otherwise, if "leader and follower" is used, it should be OK to read on leader.
Besides, the original code always select the first one. also has the issue of balance
I also feel strange that why getNextReplica
is unused, and except which replicaIdx
is changed nowhere. Any idea @iosmanthus ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The customer wants a "fallback" strategy. They want to read followers first to isolate dump traffic. But if all the followers are down, fallback to leaders as a cover.
So if "follower and leader" is specified, it means use followers first, and then leaders. The order matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is stale read enabled ? If not, when all followers are down, the leader is also unavailable (neither readable nor writable as well, see https://www.pingcap.com/blog/lease-read/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I think we take some terms wrong, it's not follower, it's learner, who will not attend voting. And yes, this is for stale read. The normal TiKV nodes will still take leader and follower and handle online traffic. And there will be extra learner nodes dedicated for offline traffic. When learners are down, goes for the normal nodes.
What problem does this PR solve?
Issue Number: close #736
What is changed and how does it work?
Judge store status before use. Use the next replica when the current replica is unreachable.
Test
I have tested in TiSpark follower read with the first store killed, its success to request the next store.