In the previous post Kafka connect in practice(1): standalone, I have introduced about the basics of kafka connect  configuration and demonstrate a local standalone demo. In this post we will show the knowledge about distributed data pull an sink. To start, do make sure the kafka broker and zookeeper have been started!

1. configuration

  1. vim $KAFKA_HOME/bin/connect-distributed.properties

set contents of this file as:

  1. # These are defaults. This file just demonstrates how to override some settings.
  2. #A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
  3. #The client will make use of all servers irrespective of which servers are specified here for bootstrapping—
  4. #this list only impacts the initial hosts used to discover the full set of servers.
  5. #This list should be in the form host1:port1,host2:port2,....
  6. #Since these servers are just used for the initial connection to discover the full cluster membership
  7. #(which may change dynamically), this list need not contain the full set of servers (you may want more than one, though,
  8. #in case a server is down).
  9. #notes: this configuration is required.
  10. bootstrap.servers=localhost:
  11.  
  12. # unique name for the cluster, used in forming the Connect cluster group.
  13. #Note that this must not conflict with consumer group IDs.
  14. #notes: this configuration is required.
  15. group.id=connect-cluster
  16.  
  17. # The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
  18. # need to configure these based on the format they want their data in when loaded from or stored into Kafka
  19. #Converter class for key Connect data. This controls the format of the data that will be written to Kafka for source connectors or read from Kafka for sink connectors.
  20. #Popular formats include Avro and JSON.
  21. #notes: this configuration is required.
  22. key.converter=org.apache.kafka.connect.json.JsonConverter
  23.  
  24. #Converter class for value Connect data. This controls the format of the data that will be written to
  25. #Kafka for source connectors or read from Kafka for sink connectors. Popular formats include Avro and JSON.
  26. #notes: this configuration is required.
  27. value.converter=org.apache.kafka.connect.json.JsonConverter
  28.  
  29. # Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
  30. # it to,
  31. #if we want to see the schema of the message ,we can turn on the *.converter.schemas.enable
  32. #vice, if we don't wana to see the schema of the message, we should turn of the *.converter.schemas.enable
  33. # generally speaking, if dev and testing env, we can turn the following attributes on for tracking consideration,
  34. # and turned off in production consideration for network and disk capacity usage.
  35. key.converter.schemas.enable=false
  36. value.converter.schemas.enable=false
  37.  
  38. #The name of the topic where connector and task configuration offsets are stored. This must be the same for all workers with
  39. #the same group.id. Kafka Connect will upon startup attempt to automatically create this topic with multiple partitions and
  40. #a compacted cleanup policy to avoid losing data, but it will simply use the topic if it already exists. If you choose to
  41. #create this topic manually, always create it as a compacted, highly replicated (3x or more) topic with a large number of
  42. #partitions (e.g., or , just like Kafka's built-in __consumer_offsets topic) to support large Kafka Connect clusters.
  43. offset.storage.topic=connect-offsets
  44. #The replication factor used when Connect creates the topic used to store connector offsets. This should always be at least
  45. # for a production system, but cannot be larger than the number of Kafka brokers in the cluster.
  46. offset.storage.replication.factor=
  47. #The number of partitions used when Connect creates the topic used to store connector offsets. A large value (e.g., or ,
  48. #just like Kafka's built-in __consumer_offsets topic) is necessary to support large Kafka Connect clusters.
  49. offset.storage.partitions=
  50.  
  51. #The name of the topic where connector and task configuration data are stored. This must be the same for all workers with
  52. #the same group.id. Kafka Connect will upon startup attempt to automatically create this topic with a single-partition and
  53. #ompacted cleanup policy to avoid losing data, but it will simply use the topic if it already exists. If you choose to create
  54. #this topic manually, always create it as a compacted topic with a single partition and a high replication factor (3x or more).
  55. config.storage.topic=connect-configs
  56. #The replication factor used when Kafka Connects creates the topic used to store connector and task configuration data.
  57. #This should always be at least for a production system, but cannot be larger than the number of Kafka brokers in the cluster.
  58. config.storage.replication.factor=
  59. config.storage.partitions=
  60.  
  61. #The name of the topic where connector and task configuration status updates are stored. This must be the same for all workers with
  62. #the same group.id. Kafka Connect will upon startup attempt to automatically create this topic with multiple partitions and a compacted
  63. #cleanup policy to avoid losing data, but it will simply use the topic if it already exists. If you choose to create this topic manually,
  64. #always create it as a compacted, highly replicated (3x or more) topic with multiple partitions.
  65. status.storage.topic=connect-status
  66. #The replication factor used when Connect creates the topic used to store connector and task status updates. This should always be at least
  67. #for a production system, but cannot be larger than the number of Kafka brokers in the cluster.
  68. status.storage.replication.factor=
  69. #The number of partitions used when Connect creates the topic used to store connector and task status updates.
  70. status.storage.partitions=
  71.  
  72. # Flush much faster than normal, which is useful for testing/debugging
  73. offset.flush.interval.ms=
  74.  
  75. # These are provided to inform the user about the presence of the REST host and port configs
  76. # Hostname & Port for the REST API to listen on. If this is set, it will bind to the interface used to listen to requests.
  77. #Hostname for the REST API. If this is set, it will only bind to this interface.
  78. #notes: this configuration is optional
  79. rest.host.name=localhost
  80. #Port for the REST API to listen on.
  81. #notes: this configuration is optional
  82. rest.port=
  83.  
  84. # The Hostname & Port that will be given out to other workers to connect to i.e. URLs that are routable from other servers.
  85. #rest.advertised.host.name=
  86. #rest.advertised.port=
  87. rest.advertised.host.name=127.0.0.1
  88. rest.advertised.port=
  89.  
  90. # Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
  91. # (connectors, converters, transformations). The list should consist of top level directories that include
  92. # any combination of:
  93. # a) directories immediately containing jars with plugins and their dependencies
  94. # b) uber-jars with plugins and their dependencies
  95. # c) directories immediately containing the package directory structure of classes of plugins and their dependencies
  96. # Note: symlinks will be followed to discover dependencies or plugins.
  97. plugin.path=/home/lenmom/workspace/software/kafka_2.-2.1./connect
  98.  
  99. #Converter class for internal key Connect data that implements the Converter interface. Used for converting data like offsets and configs.
  100. #notes: this configuration is optional
  101. internal.value.converter=org.apache.kafka.connect.json.JsonConverter
  102. #Converter class for offset value Connect data that implements the Converter interface. Used for converting data like offsets and configs.
  103. #notes: this configuration is optional
  104. internal.key.converter=org.apache.kafka.connect.json.JsonConverter
  105. #notes: this configuration is optional
  106. task.shutdown.graceful.timeout.ms=
  107. #notes: this configuration is optional
  108. offset.flush.timeout.ms=

2. download a debezium-connector-mysql plugin tarbar and unzip it into the the folder defined at the end of connect-distributed.properties

2. start mysql server with bin-log enabled

for detail please refer my previous blog post mysql 5.7 enable binlog

3. start the kafka connect distributed

  1. sh $KAFKA_HOME/connect-distributed.sh $KAFKA_HOME/config/connect-distributed.properties
  2. #or start in background
  3. #sh $KAFKA_HOME/connect-distributed.sh --daemon $KAFKA_HOME/config/connect-distributed.properties

it starts with the following screenshot.

be aware that the INFO Added aliases 'MySqlConnector' and 'MySql' to plugin 'io.debezium.connector.mysql.MySqlConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:390), our target plugin has been loaded by the kafka connect distributed.

4. create a the demo database in mysql 

  1. # In production you would almost certainly limit the replication user must be on the follower (slave) machine,
  2. # to prevent other clients accessing the log from other machines. For example, 'replicator'@'follower.acme.com'.
  3. #
  4. # However, this grant is equivalent to specifying *any* hosts, which makes this easier since the docker host
  5. # is not easily known to the Docker container. But don't do this in production.
  6. #
  7. GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicator' IDENTIFIED BY 'replpass';
  8. GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'debezium' IDENTIFIED BY 'dbz';
  9.  
  10. # Create the database that we'll use to populate data and watch the effect in the binlog
  11. DROP DATABASE if exists inventory;
  12. CREATE DATABASE if not exists inventory;
  13. GRANT ALL PRIVILEGES ON inventory.* TO 'root'@'%';
  14.  
  15. # Switch to this database
  16. USE inventory;
  17.  
  18. # Create and populate our products using a single insert with many rows
  19. CREATE TABLE products (
  20. id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
  21. name VARCHAR() NOT NULL,
  22. description VARCHAR(),
  23. weight FLOAT
  24. );
  25. ALTER TABLE products AUTO_INCREMENT = ;
  26.  
  27. INSERT INTO products
  28. VALUES (default,"scooter","Small 2-wheel scooter",3.14),
  29. (default,"car battery","12V car battery",8.1),
  30. (default,"12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3",0.8),
  31. (default,"hammer","12oz carpenter's hammer",0.75),
  32. (default,"hammer","14oz carpenter's hammer",0.875),
  33. (default,"hammer","16oz carpenter's hammer",1.0),
  34. (default,"rocks","box of assorted rocks",5.3),
  35. (default,"jacket","water resistent black wind breaker",0.1),
  36. (default,"spare tire","24 inch spare tire",22.2);
  37.  
  38. # Create and populate the products on hand using multiple inserts
  39. CREATE TABLE products_on_hand (
  40. product_id INTEGER NOT NULL PRIMARY KEY,
  41. quantity INTEGER NOT NULL,
  42. FOREIGN KEY (product_id) REFERENCES products(id)
  43. );
  44.  
  45. INSERT INTO products_on_hand VALUES (,);
  46. INSERT INTO products_on_hand VALUES (,);
  47. INSERT INTO products_on_hand VALUES (,);
  48. INSERT INTO products_on_hand VALUES (,);
  49. INSERT INTO products_on_hand VALUES (,);
  50. INSERT INTO products_on_hand VALUES (,);
  51. INSERT INTO products_on_hand VALUES (,);
  52. INSERT INTO products_on_hand VALUES (,);
  53. INSERT INTO products_on_hand VALUES (,);
  54.  
  55. # Create some customers ...
  56. CREATE TABLE customers (
  57. id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
  58. first_name VARCHAR() NOT NULL,
  59. last_name VARCHAR() NOT NULL,
  60. email VARCHAR() NOT NULL UNIQUE KEY
  61. ) AUTO_INCREMENT=;
  62.  
  63. INSERT INTO customers
  64. VALUES (default,"Sally","Thomas","sally.thomas@acme.com"),
  65. (default,"George","Bailey","gbailey@foobar.com"),
  66. (default,"Edward","Walker","ed@walker.com"),
  67. (default,"Anne","Kretchmar","annek@noanswer.org");
  68.  
  69. # Create some veyr simple orders
  70. CREATE TABLE orders (
  71. order_number INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
  72. order_date DATE NOT NULL,
  73. purchaser INTEGER NOT NULL,
  74. quantity INTEGER NOT NULL,
  75. product_id INTEGER NOT NULL,
  76. FOREIGN KEY order_customer (purchaser) REFERENCES customers(id),
  77. FOREIGN KEY ordered_product (product_id) REFERENCES products(id)
  78. ) AUTO_INCREMENT = ;
  79.  
  80. INSERT INTO orders
  81. VALUES (default, '2016-01-16', , , ),
  82. (default, '2016-01-17', , , ),
  83. (default, '2016-02-19', , , ),
  84. (default, '2016-02-21', , , );

5. Register a mysql connector using kafka connector rest api

5.1 check connector version

  1. curl -H "Accept:application/json" localhost:/

output

  1. {"version":"2.1.0","commit":"809be928f1ae004e","kafka_cluster_id":"NGQRxNZMSY6Q53ktQABHsQ"}

5.2 get current connector list

  1. lenmom@M1701:~/$ curl -H "Accept:application/json" localhost:/connectors/
  2. []

the ouput indicates there's no connector in the distributed connector.

5.3 registet a mysql connector instance to in the distributed connector

  1. $ curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:/connectors/ -d '{ "name": "inventory-connector", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "tasks.max": "1", "database.hostname": "127.0.0.1", "database.port": "3306", "database.user": "root", "database.password": "root", "database.server.id": "184054", "database.server.name": "127.0.0.1", "database.whitelist": "inventory", "database.history.kafka.bootstrap.servers": "127.0.0.1:9092", "database.history.kafka.topic": "dbhistory.inventory" } }'

the formated content describes as follows for readable consideration

  1. {
  2. "name": "inventory-connector",
  3. "config": {
  4. "connector.class": "io.debezium.connector.mysql.MySqlConnector",
  5. "tasks.max": "",
  6. "database.hostname": "127.0.0.1",
  7. "database.port": "",
  8. "database.user": "root",
  9. "database.password": "root",
  10. "database.server.id": "",
  11. "database.server.name": "127.0.0.1",
  12. "database.whitelist": "inventory",
  13. "database.history.kafka.bootstrap.servers": "127.0.0.1:9092",
  14. "database.history.kafka.topic": "schema-changes.inventory"
  15. }
  16. }

the response of the post request show as follows:

  1. HTTP/1.1 Created
  2. Date: Tue, apr 16:: GMT
  3. Location: http://localhost:8083/connectors/inventory-connector
  4. Content-Type: application/json
  5. Content-Length:
  6. Server: Jetty(9.2..v20160210)
  7.  
  8. {
  9. "name": "inventory-connector",
  10. "config": {
  11. "connector.class": "io.debezium.connector.mysql.MySqlConnector",
  12. "tasks.max": "",
  13. "database.hostname": "mysql",
  14. "database.port": "",
  15. "database.user": "root",
  16. "database.password": "root",
  17. "database.server.id": "",
  18. "database.server.name": "127.0.0.1",
  19. "database.whitelist": "inventory",
  20. "database.history.kafka.bootstrap.servers": "127.0.0.1:9092",
  21. "database.history.kafka.topic": "dbhistory.inventory",
  22. "name": "inventory-connector"
  23. },
  24. "tasks": []
  25. }

 5.4 get connector using curl again, there should exist a connector since we have just registered one.

  1. curl -H "Accept:application/json" localhost:/connectors/

output

  1. ["inventory-connector"]

5.5 get connector detail

  1. curl -i -X GET -H "Accept:application/json" localhost:/connectors/inventory-connector

output

  1. HTTP/1.1 OK
  2. Date: Wed, Apr :: GMT
  3. Content-Type: application/json
  4. Content-Length:
  5. Server: Jetty(9.4..v20180830)
  6.  
  7. {
  8. "name": "inventory-connector",
  9. "config": {
  10. "connector.class": "io.debezium.connector.mysql.MySqlConnector",
  11. "database.user": "root",
  12. "database.server.id": "",
  13. "tasks.max": "",
  14. "database.history.kafka.bootstrap.servers": "127.0.0.1:9092",
  15. "database.history.kafka.topic": "dbhistory.inventory",
  16. "database.server.name": "127.0.0.1",
  17. "database.port": "",
  18. "database.hostname": "127.0.0.1",
  19. "database.password": "root",
  20. "name": "inventory-connector",
  21. "database.whitelist": "inventory"
  22. },
  23. "tasks": [
  24. {
  25. "connector": "inventory-connector",
  26. "task":
  27. }
  28. ],
  29. "type": "source"
  30. }

the "task":0 indicate there is one task with id 0.

5.5 list the kafka topics 

  1. sh $KAFKA_HOME/bin/kafka-topics.sh --zookeeper localhost: --list

output

  1. 127.0.0.1
  2. 127.0.0.1.inventory.customers
  3. 127.0.0.1.inventory.orders
  4. 127.0.0.1.inventory.products
  5. 127.0.0.1.inventory.products_on_hand
  6. __consumer_offsets
  7. connect-configs
  8. connect-offsets
  9. connect-status
  10. connect-test
  11. dbhistory.inventory

and this indicate the mysql connector has started watching the mysql data changes and begin to push the changed data to kafka broker.

 5.6 abserve the data in the kafka topic

  1. sh $KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server localhost: --topic 127.0.0.1.inventory.customers --from-beginning # all the data changes for table customers in database inventory would be listed

output:

  1. {"before":null,"after":{"id":,"first_name":"Sally","last_name":"Thomas","email":"sally.thomas@acme.com"},"source":{"version":"0.9.4.Final","connector":"mysql","name":"127.0.0.1","server_id":,"ts_sec":,"gtid":null,"file":"mysql-bin.000012","pos":,"row":,"snapshot":true,"thread":null,"db":"inventory","table":"customers","query":null},"op":"c","ts_ms":}
  2. {"before":null,"after":{"id":,"first_name":"George","last_name":"Bailey","email":"gbailey@foobar.com"},"source":{"version":"0.9.4.Final","connector":"mysql","name":"127.0.0.1","server_id":,"ts_sec":,"gtid":null,"file":"mysql-bin.000012","pos":,"row":,"snapshot":true,"thread":null,"db":"inventory","table":"customers","query":null},"op":"c","ts_ms":}
  3. {"before":null,"after":{"id":,"first_name":"Edward","last_name":"Walker","email":"ed@walker.com"},"source":{"version":"0.9.4.Final","connector":"mysql","name":"127.0.0.1","server_id":,"ts_sec":,"gtid":null,"file":"mysql-bin.000012","pos":,"row":,"snapshot":true,"thread":null,"db":"inventory","table":"customers","query":null},"op":"c","ts_ms":}
  4. {"before":null,"after":{"id":,"first_name":"Anne","last_name":"Kretchmar","email":"annek@noanswer.org"},"source":{"version":"0.9.4.Final","connector":"mysql","name":"127.0.0.1","server_id":,"ts_sec":,"gtid":null,"file":"mysql-bin.000012","pos":,"row":,"snapshot":true,"thread":null,"db":"inventory","table":"customers","query":null},"op":"c","ts_ms":}

let's take a record as formated as example,say the first one.

  1. {
  2. "before": null,
  3. "after": {
  4. "id": ,
  5. "first_name": "Sally",
  6. "last_name": "Thomas",
  7. "email": "sally.thomas@acme.com"
  8. },
  9. "source": {
  10. "version": "0.9.4.Final",
  11. "connector": "mysql",
  12. "name": "127.0.0.1",
  13. "server_id": ,
  14. "ts_sec": ,
  15. "gtid": null,
  16. "file": "mysql-bin.000012",
  17. "pos": ,
  18. "row": ,
  19. "snapshot": true,
  20. "thread": null,
  21. "db": "inventory",
  22. "table": "customers",
  23. "query": null
  24. },
  25. "op": "c",
  26. "ts_ms":
  27. }

the op field indicate the data change type:

  • c: insert a record into the database.   if c, the before element would be null.
  • d: delete a record in the database.    if d, the after element would be null
  • u:update a record in the database.   the before element indicate the data in the database when the update take action, and the after element indicate the data after the update take action.

let's do some update and delete operations in mysql database to see the changed data captured in kafka.

  1. mysql> select * from customers;
  2. +------+------------+-----------+-----------------------+
  3. | id | first_name | last_name | email |
  4. +------+------------+-----------+-----------------------+
  5. | | Sally | Thomas | sally.thomas@acme.com |
  6. | | George | Bailey | gbailey@foobar.com |
  7. | | Edward | Walker | ed@walker.com |
  8. | | Anne | Kretchmar | annek@noanswer.org |
  9. +------+------------+-----------+-----------------------+
  10. rows in set (0.00 sec)
  11.  
  12. mysql> update customers set first_name='1234' where id=1004;
  13. Query OK, row affected (0.01 sec)
  14. Rows matched: Changed: Warnings:
  15.  
  16. mysql> delete from customers where id=1004;
  17. Query OK, row affected (0.01 sec)
  18.  
  19. mysql> select * from customers;
  20. +------+------------+-----------+-----------------------+
  21. | id | first_name | last_name | email |
  22. +------+------------+-----------+-----------------------+
  23. | | Sally | Thomas | sally.thomas@acme.com |
  24. | | George | Bailey | gbailey@foobar.com |
  25. | | Edward | Walker | ed@walker.com |
  26. +------+------------+-----------+-----------------------+
  27. rows in set (0.00 sec)

as it shows, we first update the record with id=1004 field first_name to 1234, and then delete the record.

kafka record:

  1. {"before":null,"after":{"id":,"first_name":"Sally","last_name":"Thomas","email":"sally.thomas@acme.com"},"source":{"version":"0.9.4.Final","connector":"mysql","name":"127.0.0.1","server_id":,"ts_sec":,"gtid":null,"file":"mysql-bin.000012","pos":,"row":,"snapshot":true,"thread":null,"db":"inventory","table":"customers","query":null},"op":"c","ts_ms":}
  2. {"before":null,"after":{"id":,"first_name":"George","last_name":"Bailey","email":"gbailey@foobar.com"},"source":{"version":"0.9.4.Final","connector":"mysql","name":"127.0.0.1","server_id":,"ts_sec":,"gtid":null,"file":"mysql-bin.000012","pos":,"row":,"snapshot":true,"thread":null,"db":"inventory","table":"customers","query":null},"op":"c","ts_ms":}
  3. {"before":null,"after":{"id":,"first_name":"Edward","last_name":"Walker","email":"ed@walker.com"},"source":{"version":"0.9.4.Final","connector":"mysql","name":"127.0.0.1","server_id":,"ts_sec":,"gtid":null,"file":"mysql-bin.000012","pos":,"row":,"snapshot":true,"thread":null,"db":"inventory","table":"customers","query":null},"op":"c","ts_ms":}
  4. {"before":null,"after":{"id":,"first_name":"Anne","last_name":"Kretchmar","email":"annek@noanswer.org"},"source":{"version":"0.9.4.Final","connector":"mysql","name":"127.0.0.1","server_id":,"ts_sec":,"gtid":null,"file":"mysql-bin.000012","pos":,"row":,"snapshot":true,"thread":null,"db":"inventory","table":"customers","query":null},"op":"c","ts_ms":}
  5. {"before":{"id":,"first_name":"Anne","last_name":"Kretchmar","email":"annek@noanswer.org"},"after":{"id":,"first_name":"","last_name":"Kretchmar","email":"annek@noanswer.org"},"source":{"version":"0.9.4.Final","connector":"mysql","name":"127.0.0.1","server_id":,"ts_sec":,"gtid":null,"file":"mysql-bin.000012","pos":,"row":,"snapshot":false,"thread":,"db":"inventory","table":"customers","query":null},"op":"u","ts_ms":}
  6. {"before":{"id":,"first_name":"","last_name":"Kretchmar","email":"annek@noanswer.org"},"after":null,"source":{"version":"0.9.4.Final","connector":"mysql","name":"127.0.0.1","server_id":,"ts_sec":,"gtid":null,"file":"mysql-bin.000012","pos":,"row":,"snapshot":false,"thread":,"db":"inventory","table":"customers","query":null},"op":"d","ts_ms":}
  7. null

there's two more record in the kafka related topic.

update:

  1. {
  2. "before": {
  3. "id": ,
  4. "first_name": "Anne",
  5. "last_name": "Kretchmar",
  6. "email": "annek@noanswer.org"
  7. },
  8. "after": {
  9. "id": ,
  10. "first_name": "",
  11. "last_name": "Kretchmar",
  12. "email": "annek@noanswer.org"
  13. },
  14. "source": {
  15. "version": "0.9.4.Final",
  16. "connector": "mysql",
  17. "name": "127.0.0.1",
  18. "server_id": ,
  19. "ts_sec": ,
  20. "gtid": null,
  21. "file": "mysql-bin.000012",
  22. "pos": ,
  23. "row": ,
  24. "snapshot": false,
  25. "thread": ,
  26. "db": "inventory",
  27. "table": "customers",
  28. "query": null
  29. },
  30. "op": "u",
  31. "ts_ms":
  32. }

delete:

  1. {
  2. "before": {
  3. "id": ,
  4. "first_name": "",
  5. "last_name": "Kretchmar",
  6. "email": "annek@noanswer.org"
  7. },
  8. "after": null,
  9. "source": {
  10. "version": "0.9.4.Final",
  11. "connector": "mysql",
  12. "name": "127.0.0.1",
  13. "server_id": ,
  14. "ts_sec": ,
  15. "gtid": null,
  16. "file": "mysql-bin.000012",
  17. "pos": ,
  18. "row": ,
  19. "snapshot": false,
  20. "thread": ,
  21. "db": "inventory",
  22. "table": "customers",
  23. "query": null
  24. },
  25. "op": "d",
  26. "ts_ms":
  27. }

if we stop the mysql connector or restart kafka broker, we should see the data still exist in the kafka since it persistent message offsets and data in kafka's specific topic we configed in connect-distributed.properties

6. register hdfs   connector plugin in distributed connect

  1. cd $KAFKA_HOME/connect # goto kafka connect plugin folder
  2. wget https://d1i4a15mxbxib1.cloudfront.net/api/plugins/confluentinc/kafka-connect-hdfs/versions/5.2.1/confluentinc-kafka-connect-hdfs-5.2.1.zip #download the hdfs plugin
  3. unzip confluentinc-kafka-connect-hdfs-5.2..zip # unzip the hdfs plugin to the connect plugin folder
  4. rm -f confluentinc-kafka-connect-hdfs-5.2..zip

7. restart the kafka connect distributed server to reload the hdfs plugin

if in temimal ,we can use just crtl + c to stop it,  and if in background mode, we can get the process id using lsof -i:8083 and then kill -9 {processid just queryed}, as list below.

  1. lenmom@M1701:~/workspace$ lsof -i:
  2. COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
  3. java lenmom 314u IPv6 0t0 TCP localhost: (LISTEN)
  4. lenmom@M1701:~/workspace$ kill

and then restart the connct distributed using command:

  1. sh $KAFKA_HOME/connect-distributed.sh $KAFKA_HOME/config/connect-distributed.properties

then we can find the hdfs connect plugin has been load in the output terminal

  1. [-- ::,] INFO Registered loader: PluginClassLoader{pluginLocation=file:/home/lenmom/workspace/software/kafka_2.-2.1./connect/confluentinc-kafka-connect-hdfs-5.2./} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:)
  2. [-- ::,] INFO Added plugin 'io.confluent.connect.hdfs.tools.SchemaSourceConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:)
  3. [-- ::,] INFO Added plugin 'io.confluent.connect.hdfs.HdfsSinkConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:)
  4. [-- ::,] INFO Added plugin 'io.confluent.connect.storage.tools.SchemaSourceConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:)

Kafka connect in practice(3): distributed mode mysql binlog ->kafka->hive的更多相关文章

  1. Streaming data from Oracle using Oracle GoldenGate and Kafka Connect

    This is a guest blog from Robin Moffatt. Robin Moffatt is Head of R&D (Europe) at Rittman Mead, ...

  2. Kafka connect快速构建数据ETL通道

    摘要: 作者:Syn良子 出处:http://www.cnblogs.com/cssdongl 转载请注明出处 业余时间调研了一下Kafka connect的配置和使用,记录一些自己的理解和心得,欢迎 ...

  3. kafka connect 使用说明

    KAFKA CONNECT 使用说明 一.概述 kafka connect 是一个可扩展的.可靠的在kafka和其他系统之间流传输的数据工具.简而言之就是他可以通过Connector(连接器)简单.快 ...

  4. kafka connect简介以及部署

    https://blog.csdn.net/u011687037/article/details/57411790 1.什么是kafka connect? 根据官方介绍,Kafka Connect是一 ...

  5. Kafka Connect简介

    Kafka Connect简介 http://colobu.com/2016/02/24/kafka-connect/#more Kafka 0.9+增加了一个新的特性Kafka Connect,可以 ...

  6. 使用kafka connect,将数据批量写到hdfs完整过程

    版权声明:本文为博主原创文章,未经博主允许不得转载 本文是基于hadoop 2.7.1,以及kafka 0.11.0.0.kafka-connect是以单节点模式运行,即standalone. 首先, ...

  7. 1.3 Quick Start中 Step 7: Use Kafka Connect to import/export data官网剖析(博主推荐)

    不多说,直接上干货! 一切来源于官网 http://kafka.apache.org/documentation/ Step 7: Use Kafka Connect to import/export ...

  8. Apache Kafka Connect - 2019完整指南

    今天,我们将讨论Apache Kafka Connect.此Kafka Connect文章包含有关Kafka Connector类型的信息,Kafka Connect的功能和限制.此外,我们将了解Ka ...

  9. Kafka Connect HDFS

    概述 Kafka 的数据如何传输到HDFS?如果仔细思考,会发现这个问题并不简单. 不妨先想一下这两个问题? 1)为什么要将Kafka的数据传输到HDFS上? 2)为什么不直接写HDFS而要通过Kaf ...

随机推荐

  1. Watchman 的安装

    先查看gcc的版本 gcc --version 如果gcc的版本低于4.8,就要升级gcc的版本了.在这里,就不延时升级gcc了, 安装watchman git clone https://githu ...

  2. C语言权威指南和书单 - 初学者

    注:点击标题免费下载电子书 1. C Primer Plus (5th Edition) 2. A Book on C C Programming: A Modern Approach (2nd Ed ...

  3. 高吞吐低延迟Java应用的垃圾回收优化

    高吞吐低延迟Java应用的垃圾回收优化 高性能应用构成了现代网络的支柱.LinkedIn有许多内部高吞吐量服务来满足每秒数千次的用户请求.要优化用户体验,低延迟地响应这些请求非常重要. 比如说,用户经 ...

  4. 计算机基础-C语言-16章-数组应用-计算字符串长度

    字符数组的大小并不代表它所包含字符串的长度.需要通过检查结束符,才能判断字符串的实际长度. 数组和指针的关系

  5. angular6 导出json数据到excal表

    1 首先使用npm下载插件依赖的安装包   npm install file-saver --save   npm install xlsx --save   2 引入项目中的ts文件中   impo ...

  6. SpringBoot框架中,使用过滤器进行加密解密操作(一)

    一.基本说明 1.请求方式:POST请求.注解@PostMapping 2.入参格式:json串 3.出参格式:json串(整体加密) 4.使用Base64进行加密解密.具体的加密方式,可以根据需求自 ...

  7. 第二次Scrum冲刺——Life in CCSU

    第二次Scrum冲刺——Life in CCSU 一. 第二次Scrum任务 继续上一次冲刺的内容,这次完成论坛部分. 二. 用户故事 用户输入账号.密码: 用户点击论坛部分: 系统显示帖子: 用户选 ...

  8. matplot画图kill问题,形成思路

    很多小伙伴刚学matplot的时候 看着代码就想敲  可是你应该现有概念啊 熟悉这两个再看下面的代码,下面的解决了一些人问中文字体的问题,满足了一般人的设置需求 代码注释很详细,我就不多哔哔了. 完全 ...

  9. KendoUI 自定义CheckBoxList

    不多说直接上代码 CSS完整代码: .e-selectboxs-container { display: table; } .e-selectbox { min-width: 100px; heigh ...

  10. .Net 一开始就不应该搞 .Net Core

    .Net 一开始就不应该搞 .Net Core,  java 跨平台 是 java 选择的道路,  .Net 应该发挥 和 平台 紧密结合 的 优势 . 如 控件哥 所说,  微软 应该把  IIS  ...