zookeeper节点无法与Leader节点通信

js81xvg6  于 2023-02-01  发布在  Apache
关注(0)|答案(2)|浏览(175)

我有一个Zookeeper集群,其中包括3个节点。Zookeeper配置如下所述。当重新启动时,它显示成功消息,但它显示状态为失败。
zoo.cfg

dataDir=/ngs/app/<app>/zookeeper-3.4.6/zookeeperdata/1
clientPort=2181
initLimit=5
syncLimit=2
server.1=pr2-ligerp-lapp27.<domain.com>:2888:3888
server.2=pr2-ligerp-lapp28.<domain.com>:2889:3889
server.3=pr2-ligerp-lapp29.<domain.com>:2890:3890

请查看以下日志:
请参见zkServer.sh开始

JMX enabled by default
Using config: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
-bash-4.1$ 
-bash-4.1$ cat zookeeper.out 
2017-04-18 18:58:13,840 [myid:] - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
2017-04-18 18:58:13,843 [myid:] - INFO  [main:QuorumPeerConfig@340] - Defaulting to majority quorums
2017-04-18 18:58:13,845 [myid:1] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2017-04-18 18:58:13,845 [myid:1] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2017-04-18 18:58:13,846 [myid:1] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2017-04-18 18:58:13,854 [myid:1] - INFO  [main:QuorumPeerMain@127] - Starting quorum peer
2017-04-18 18:58:13,861 [myid:1] - INFO  [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2017-04-18 18:58:13,875 [myid:1] - INFO  [main:QuorumPeer@959] - tickTime set to 3000
2017-04-18 18:58:13,875 [myid:1] - INFO  [main:QuorumPeer@979] - minSessionTimeout set to -1
2017-04-18 18:58:13,875 [myid:1] - INFO  [main:QuorumPeer@990] - maxSessionTimeout set to -1
2017-04-18 18:58:13,875 [myid:1] - INFO  [main:QuorumPeer@1005] - initLimit set to 5
2017-04-18 18:58:13,884 [myid:1] - INFO  [main:FileSnap@83] - Reading snapshot /ngs/app/ligerp/solr/zookeeper-3.4.6/zookeeperdata/1/version-2/snapshot.1300000032
2017-04-18 18:58:13,954 [myid:1] - INFO  [Thread-1:QuorumCnxManager$Listener@504] - My election bind port: pr2-ligerp-lapp27.<domain>/10.136.145.38:3888
2017-04-18 18:58:13,960 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714] - LOOKING
2017-04-18 18:58:13,961 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@815] - New election. My id =  1, proposed zxid=0x130000024b
2017-04-18 18:58:13,962 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x130000024b (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x13 (n.peerEpoch) LOOKING (my state)
2017-04-18 18:58:13,964 [myid:1] - INFO  [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:13,964 [myid:1] - INFO  [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:14,165 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:14,166 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:14,166 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 400
2017-04-18 18:58:15,566 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:15,567 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:15,567 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 800
2017-04-18 18:58:16,368 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:16,368 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:16,368 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 1600
2017-04-18 18:58:17,969 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:17,969 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:17,970 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 3200

但在检查状态后,我们发现它没有运行,但我们仍然可以看到进程ID。有人能帮助解决这个问题吗?
shzkServer.sh状态

JMX enabled by default
Using config: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.

这是zookeeper的leader节点中的例外:

2017-04-18 18:25:32,634 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:47916
2017-04-18 18:25:32,635 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn@827] - Processing srvr command from /127.0.0.1:47916
2017-04-18 18:25:32,635 [myid:3] - INFO  [Thread-22:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:47916 (no session established for client)
2017-04-18 18:30:01,662 [myid:3] - WARN  [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id 1, my id = 3, error = 
java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
2017-04-18 18:30:01,663 [myid:3] - WARN  [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
2017-04-18 18:30:01,662 [myid:3] - ERROR [LearnerHandler-/10.136.145.38:47656:LearnerHandler@633] - Unexpected exception causing shutdown while sock still open
java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
    at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
    at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:546)
2017-04-18 18:30:01,663 [myid:3] - WARN  [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting for message on queue
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
    at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
2017-04-18 18:30:01,663 [myid:3] - WARN  [LearnerHandler-/10.136.145.38:47656:LearnerHandler@646] - ******* GOODBYE /10.136.145.38:47656 ********
2017-04-18 18:30:01,663 [myid:3] - WARN  [SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving thread
2017-04-18 18:39:40,076 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxnFactory@197] - Accepted socket connection from /10.136.145.38:58748
2017-04-18 18:39:40,077 [myid:3] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
    at java.lang.Thread.run(Thread.java:745)
2017-04-18 18:39:40,078 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn@1007] - Closed socket connection for client /10.136.145.38:58748 (no session established for client)
2017-04-18 18:42:46,516 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:47988
2017-04-18 18:42:46,516 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn@827] - Processing srvr command from /127.0.0.1:47988
2017-04-18 18:42:46,517 [myid:3] - INFO  [Thread-23:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:47988 (no session established for client)
2fjabf4q

2fjabf4q1#

通过在所有zookeeper节点的zoo.cfg中进行以下更改修复了该问题:
1.将zookeeper主机名更改为IP地址
1.按ID降序启动zookeeper示例。
1.初始大小从5增加到100
zoo.cfg示例如下所示:

dataDir=/ngs/app/ligerp/solr/zookeeper-3.4.6/zookeeperdata/1
clientPort=2181
initLimit=100
syncLimit=2
server.1=10.136.145.38:2888:3888
server.2=10.136.145.39:2889:3889
server.3=10.136.145.40:2890:3890
u1ehiz5o

u1ehiz5o2#

在我的例子中,这仅仅是一个没有足够的CPU分配给进程的问题。尝试分配额外的CPU,看看这是否解决了问题。

相关问题