如果zookeeper领导者进程被终止,所有的追随者是否也会获得异常并重新启动?

polhcujo  于 2022-12-09  发布在  Apache
关注(0)|答案(1)|浏览(117)

I'm working on a project using Zookeeper 3.4.6, and am performing some failure mode testing. While doing so, I found (what I think is) unexpected behaviour.
Should followers restart if the leader Zookeeper process is killed?

Environment:

OS:        Windows Server 2008 R2 (hosted in a Tanuki Java service wrapper)
Zookeeper: 3.4.6
Java JDK:  1.7.0.210

Tests:

The test is to kill Zookeeper processes and make sure the cluster recovers.
If I kill a non-leader process, it restarts and rejoins the cluster without affecting other nodes.
If I kill the leader process, the leader and followers restart. This doesn't seem right, as there's a period of time where clients can't connect to any Zookeeper node.
I've tried both TCP and UDP communication settings, but both exhibit the same behaviour. UDP is twice as quick to recover though.

Zookeeper settings

tickTime=2000
initLimit=5
syncLimit=2
minSessionTimeout=5000
maxSessionTimeout=120000
dataDir=C:\\ProgramData\\Saab OneView\\ZooKeeper\\zoo-data
clientPort=2181
leaderServes=yes
autopurge.purgeInterval=24

# IP addresses blanked out here
server.1=0.0.0.1:2888:3888
server.2=0.0.0.2:2888:3888
server.3=0.0.0.3:2888:3888
server.4=0.0.0.4:2888:3888
server.5=0.0.0.5:2888:3888

# This is for zookeeper->zookeeper communication
# I've tried both settings, UDP has faster recovery time
# 0 = UDP 
# 3 = TCP (default)
electionAlg=3

Sample follower exception causing shutdown

20160309 05:35:51.958Z 20160309 05:35:51.958 [myid:3] - WARN  [RecvWorker:4:QuorumCnxManager$RecvWorker@780] - Connection broken for id 4, my id = 3, error = 
java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.DataInputStream.readInt(Unknown Source)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
20160309 05:35:51.959Z 20160309 05:35:51.959 [myid:3] - WARN  [RecvWorker:4:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
20160309 05:35:51.959Z 20160309 05:35:51.959 [myid:3] - WARN  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at java.io.DataInputStream.readInt(Unknown Source)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
    at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
    at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
    at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
20160309 05:35:51.960Z 20160309 05:35:51.960 [myid:3] - INFO  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
    at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790)
deyfvvtc

deyfvvtc1#

基于ZOOKEEPER-3478,这是一种预期行为:
在领导者选举期间,所有追随者关闭是正常的行为。由于领导者崩溃后没有领导者,所以曾经是追随者的服务器不再是追随者。所以追随者关闭并返回到LOOKING状态,以便找到新的领导者。

相关问题