Zookeeper 雪花Kafka连接器的疑惑和疑问

z2acfund  于 2022-12-09  发布在  Apache
关注(0)|答案(1)|浏览(131)

I am using 3 server cluster for the Kafka Configuration, with Snowflake connector REST API to push the data to Snowflake database: All are 3 different VMs running on AWS
1.In this, does we require 3 kafka individual server zookeeper-services needs to be up and running in cluster else only 1 is enough, as if it needs to be executed in all the 3 servers zookeeper services, does it require different port configurations like for ex:
1.a:zookeeper.connect=xx.xx.xx.xxx:2181, xx.xx.xx.xxx:2182, xx.xx.xx.xxx:2183 else it should be 2181 in all the servers.properties file
1.b:PLAINTEXT://localhost:9091 in server1, PLAINTEXT://localhost:9092 and PLAINTEXT://localhost:9093 (Even in this it should be localhost else IP Address) that needs to be given?
1.c:server.1=<zookeeper_1_IP>:2888:3888, server.1=<zookeeper_2_IP>:2888:3888, server.1=<zookeeper_3_IP>:2888:3888 (Over here on each server the 2888:3888 needs to be same right?)
1.d:Clientport=2181 needs to be the same across the services in all 3 VMs else it needs to be different?
1.e:Does the listeners = PLAINTEXT://your.host.name:9092 on each server should have separate port like VM-Server1:9092, VM-Server2:9093, VM-Server3:9094. Else the master server-IP should be given in the worker-nodes that is Server2 and Server3 else the own server IP of that worker-node

  1. What should be the configuration for connector in regards with REST-API for the configuration item "tasks.max":"1". As I am going with 3 server cluster for Kafka and would be starting the 3 distribute-connector on all the 3 machines
  2. I am getting duplicates, if I am starting the services of distributed connector in the 2nd server, how these duplicate records can be avoided. But yes if its only 1 distributed-connector is running the services, then there are no duplicates. Please advice, as the lag gets increased if only 1 distributed-connector services is up and running.
  3. Create /data/zookeeper/myid file and give value 1 for zookeeper1 , 2 for zookeeper2 and 3for zookeeper3. Is this necessary when you are in different VM?
  4. The distributed-connector services once started executing for sometime and then it gets disconnected
  5. Any other parameter for the 3 server cluster architecture and best practices which needs to be followed
xj3cbfub

xj3cbfub1#

Kafka和Zookeeper
你只需要一个Kafka代理和Zookeeper服务器,尽管多一个会提供容错。你不需要在Zookeeper中手动创建任何东西,比如myid文件。
端口不需要相同,但如果端口相同,绘制网络图并自动配置显然更容易。
关于Kafka监听器,请阅读this post。对于Zookeeper,如果您想创建一个集群,请遵循其文档。
或者使用Amazon MSK / Confluent Cloud等代替EC2,这一切都为你完成了。
Kafka连接
tasks.max可以任意多,但是如果您有一个source连接器,那么多个线程可能会导致重复,是的。

相关问题