背景:

  最近一段时间公司做一个技术架构的更改,由于之前使用的solr和目前的业务不太匹配,具体原因不多说啦。所以要把数据放到Elasticsearch中进行快速的搜索,这是便产生了一个数据迁移的需求,把MySQL数据库中的部分库表放到ES一份,便于快速搜索。明确需求:MySQL数据迁移到ES中。

具体要求:

  订单表作为主表,商品表和物流表作为附表,数据迁移到ES的同一个索引中。

  方案:利用ES的父子文档类型、canal-server和canal-adapter

环境介绍:

  内存很紧张,经费有限,但是不影响记录。

  1. [root@aliyun ~]# cat /etc/redhat-release
  2. CentOS Linux release (Core)
  3. [root@aliyun ~]# uname -r
  4. -.el7.x86_64
  5. [root@aliyun ~]# free -h
  6. total used free shared buff/cache available
  7. Mem: .8G .3G 65M 528K 440M 345M
  8. Swap: 0B 0B 0B
  9.  
  10. MySQL版本5.6.45
  11.  
  12. mysql> select version();

+-----------+
| version() |
+-----------+
| 5.6.45 |
+-----------+
1 row in set (0.03 sec)

elasticsearch版本6.7.0

[root@aliyun ~]# curl localhost:9200
{
"name" : "node-1",
"cluster_name" : "my-application",
"cluster_uuid" : "M5i8CoTJTOepn1GwdXgfxg",
"version" : {
"number" : "6.7.0",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "8453f77",
"build_date" : "2019-03-21T15:32:29.844721Z",
"build_snapshot" : false,
"lucene_version" : "7.7.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

jdk版本:1.8.0

[root@aliyun ~]# java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)

canal-server/canal-adapter版本:1.1.3

数据流的逻辑图如下:

  canal-server伪装成mysql数据库的一个slave从库,获取mysql数据库的binlog日志数据,解析。canal-server自身配置实例来获取binlog数据,canap-adapter程序连接canal-server中配置的实例,获取binlog信息,消费binlog数据,通过ES适配器同步到ES对应的索引中。大致的一个数据流程就是这样的,具体配置请往下看。

具体配置实现:

1、开启MySQL数据的binlog日志记录并且为ROW模式

  1. [root@aliyun ~]# vim /etc/my.cnf
  2.  
  3. ......
  4.  
  5. [mysqld]
  6. log-bin=mysql-bin # 开启 binlog
  7. binlog-format=ROW # 选择 ROW 模式
  8. server_id =
  9.  
  10. ......

  重启MySQL,登陆MySQL检测配置是否生效。

  1. mysql> show variables like 'log_bin%';
  2. +---------------------------------+----------------------------------+
  3. | Variable_name | Value |
  4. +---------------------------------+----------------------------------+
  5. | log_bin | ON |
  6. | log_bin_basename | /data/mysql/data/mysql-bin |
  7. | log_bin_index | /data/mysql/data/mysql-bin.index |
  8. | log_bin_trust_function_creators | OFF |
  9. | log_bin_use_v1_row_events | OFF |
  10. +---------------------------------+----------------------------------+
  11. rows in set (0.00 sec)
  12.  
  13. mysql> show variables like 'binlog_format%';
  14. +---------------+-------+
  15. | Variable_name | Value |
  16. +---------------+-------+
  17. | binlog_format | ROW |
  18. +---------------+-------+
  19. row in set (0.00 sec)

  创建canal-server连接MySQL的用户并授予读取binlog的权限。

  1. mysql> grant all on *.* to canal@'%' identified by 'canal';
  2. Query OK, rows affected (0.01 sec)
  3.  
  4. mysql> flush privileges;
  5. Query OK, rows affected (0.00 sec)

2、部署canal-server

canal程序的各个组件的下载地址为:https://github.com/alibaba/canal/releases

  下载程序包:

  1. [root@aliyun ~]# wget https://github.com/alibaba/canal/releases/download/canal-1.1.3/canal.deployer-1.1.3.tar.gz

  解压程序包:

  1. [root@aliyun ~]# mkdir /usr/local/canal-server
  2. [root@aliyun ~]# .tar.gz -C /usr/local/canal-server/
  3. [root@aliyun ~]# ll /usr/local/canal-server/
  4. total
  5. drwxr-xr-x root root Jul : bin
  6. drwxr-xr-x root root Jul : conf
  7. drwxr-xr-x root root Jul : lib
  8. drwxrwxrwx root root Apr : logs

  上去修改配置文件:

  需要我们重点关注的一些参数同时也是我们平常修改最多的参数如下:

  1. canal.destinations = example #canal-server创建的实例,其他参数没有特殊需求的话不用改动

  完整的根配置文件及简单参数的介绍如下:

  1. [root@aliyun conf]# cd /usr/local/canal-server/conf/
  2. [root@aliyun conf]# cat canal.properties
  3. #################################################
  4. ######### common argument #############
  5. #################################################
  6. #canal.manager.jdbc.url=jdbc:mysql://127.0.0.1:3306/canal_manager?useUnicode=true&characterEncoding=UTF-8
  7. #canal.manager.jdbc.username=root
  8. #canal.manager.jdbc.password=
  9. canal.id = 1                                  #无意义的值
  10. canal.ip =                                   #运行canal-server服务的主机IP,可以不用配置,他会自动绑定一个本机的IP
  11. canal.port = 11111                              #canal-server监听的端口(TCP模式下,非TCP模式不监听1111端口)
  12. canal.metrics.pull.port = 11112                       #canal-server metrics.pull监听的端口canal.zkServers =                               #集群模式下要配置zookeeper进行协调配置,单机模式可以不用配置
  13. # flush data to zk                              #下面的两个参数是刷新数据到ZK的配置
  14. canal.zookeeper.flush.period =
  15. canal.withoutNetty = false
  16. # tcp, kafka, RocketMQ
  17. canal.serverMode = tcp                            #canal-server运行的模式,TCP模式就是直连客户端,不经过中间件。kafka和mq是消息队列的模式
  18. # flush meta cursor/parse position to file
  19. canal.file.data.dir = ${canal.conf.dir}                  #存放数据的路径
  20. canal.file.flush.period = 1000                        
  21. ## memory store RingBuffer size, should be Math.pow(,n)        #下面是一些系统参数的配置,包括内存、网络等
  22. canal.instance.memory.buffer.size =
  23. ## memory store RingBuffer used memory unit size , default 1kb
  24. canal.instance.memory.buffer.memunit =
  25. ## meory store gets mode used MEMSIZE or ITEMSIZE
  26. canal.instance.memory.batch.mode = MEMSIZE
  27. canal.instance.memory.rawEntry = true
  28.  
  29. ## detecing config                                #这里是心跳检查的配置,做HA时会用到
  30. canal.instance.detecting.enable = false
  31. #canal.instance.detecting.sql = insert into retl.xdual values(,now()) on duplicate key update x=now()
  32. canal.instance.detecting.sql =
  33. canal.instance.detecting.interval.
  34. canal.instance.detecting.retry.threshold =
  35. canal.instance.detecting.heartbeatHaEnable = false
  36.  
  37. # support maximum transaction size, more than the size of the transaction will be cut into multiple transactions delivery
  38. canal.instance.transaction.size =
  39. # mysql fallback connected to new master should fallback times
  40. canal.instance.fallbackIntervalInSeconds =
  41.  
  42. # network config
  43. canal.instance.network.receiveBufferSize =
  44. canal.instance.network.sendBufferSize =
  45. canal.instance.network.soTimeout =
  46.  
  47. # binlog filter config                            #binlog过滤的配置,指定过滤那些SQL
  48. canal.instance.filter.druid.ddl = true
  49. canal.instance.filter.query.dcl = false
  50. canal.instance.filter.query.dml = false
  51. canal.instance.filter.query.ddl = false
  52. canal.instance.filter.table.error = false
  53. canal.instance.filter.rows = false
  54. canal.instance.filter.transaction.entry = false
  55.  
  56. # binlog format/image check                         #binlog格式检测,使用ROW模式,非ROW模式也不会报错,但是同步不到数据,具体原因百度binlog格式的区别
  57. canal.instance.binlog.format = ROW,STATEMENT,MIXED
  58. canal.instance.binlog.image = FULL,MINIMAL,NOBLOB
  59.  
  60. # binlog ddl isolation
  61. canal.instance.get.ddl.isolation = false
  62.  
  63. # parallel parser config                           #并行解析配置,如果是单个CPU就把下面这个true改为false
  64. canal.instance.parser.parallel = true
  65. ## concurrent thread number, default % available processors, suggest not to exceed Runtime.getRuntime().availableProcessors()
  66. #canal.instance.parser.parallelThreadSize =
  67. ## disruptor ringbuffer size, must be power of
  68. canal.instance.parser.parallelBufferSize =
  69.  
  70. # table meta tsdb info                          #tsdb没搞明白他是干嘛的
  71. canal.instance.tsdb.enable = true
  72. canal.instance.tsdb.dir = ${canal.file.data.dir:../conf}/${canal.instance.destination:}
  73. canal.instance.tsdb.url = jdbc:h2:${canal.instance.tsdb.;MODE=MYSQL;
  74. canal.instance.tsdb.dbUsername = canal                 #连接数据的用户名和密码
  75. canal.instance.tsdb.dbPassword = canal
  76. # dump snapshot interval, default hour
  77. canal.instance.tsdb.snapshot.interval =
  78. # purge snapshot expire , default hour( days)
  79. canal.instance.tsdb.snapshot.expire =
  80.  
  81. # aliyun ak/sk , support rds/mq
  82. canal.aliyun.accessKey =
  83. canal.aliyun.secretKey =
  84.  
  85. #################################################
  86. ######### destinations #############
  87. #################################################
  88. canal.destinations = example                            #这个是很重要的参数,在这里指定你要创建的实例的名字,比如test1,test2等,逗号隔开
  89. # conf root dir
  90. canal.conf.dir = ../conf
  91. # auto scan instance dir add/remove and start/stop instance
  92. canal.auto.scan = true                                #自动扫描加载配置
  93. canal.auto.scan.interval =
  94.  
  95. canal.instance.tsdb.spring.xml = classpath:spring/tsdb/h2-tsdb.xml
  96. #canal.instance.tsdb.spring.xml = classpath:spring/tsdb/mysql-tsdb.xml
  97.  
  98. canal.instance.global.mode = spring
  99. canal.instance.global.lazy = false
  100. #canal.instance.global.manager.address =
  101. #canal.instance.global.spring.xml = classpath:spring/memory-instance.xml
  102. canal.instance.global.spring.xml = classpath:spring/file-instance.xml
  103. #canal.instance.global.spring.xml = classpath:spring/default-instance.xml
  104.  
  105. ##################################################
  106. ######### MQ #############          #MQ的配置参数,这篇不讲消息队列,只写单机canal-server直连的TCP模式
  107. ##################################################
  108. canal.mq.servers =
  109. canal.mq.retries =
  110. canal.mq.batchSize =
  111. canal.mq.maxRequestSize =
  112. canal.mq.lingerMs =
  113. canal.mq.bufferMemory =
  114. canal.mq.canalBatchSize =
  115. canal.mq.canalGetTimeout =
  116. canal.mq.flatMessage = true
  117. canal.mq.compressionType = none
  118. canal.mq.acks = all
  119. # use transaction for kafka flatMessage batch produce
  120. canal.mq.transaction = false
  121. #canal.mq.properties. =

实例配置:

  在根配置文件中创建了实例名称之后,需要在根配置的同级目录下创建该实例目录,canal-server为我们提供了一个示例的实例配置,因此我们可以直接复制该示例,举个例子吧:根配置配置了如下实例:

  1. [root@aliyun conf]# vim canal.properties...canal.destinations = user_order,delivery_info
  2. ...
  3. 我们需要在根配置的同级目录下创建这两个实例
  4. [root@aliyun conf]# pwd
  5. /usr/local/canal-server/conf
  6. [root@aliyun conf]# cp -a example/ user_order
  7. [root@aliyun conf]# cp -a example/ delivery_info

  下面是每个实例的配置参数:配置文件为 example/instance.properties

  在这个配置里我们需要注意的参数有下面几个:

  1. [root@aliyun example]# vim instance.properties
  2. canal.instance.master.address=127.0.0.1:3306          #指定要读取binlog的MySQL的IP地址和端口
  3. canal.instance.master.journal.name=                #从指定的binlog文件开始读取数据
  4. canal.instance.master.position=                  #指定偏移量,做过主从复制的应该都理解这两个参数。
  5.                                      #tips:binlog和偏移量也可以不指定,则canal-server会从当前的位置开始读取。我建议不设置
  6. canal.instance.dbUsername=canal                 #指定连接mysql的用户密码
  7. canal.instance.dbPassword=canal
  8. canal.instance.connectionCharset = UTF-8             #字符集
  9.  
  10. canal.instance.filter.regex=.*\\..*                #这个是比较重要的参数,匹配库表白名单,比如我只要test库的user表的增量数据,则这样写 test.user
  11. # table black regex
  12. canal.instance.filter.black.regex=                #这个是黑名单,规则和白名单一样。具体匹配规则可以参见wiki:https://github.com/alibaba/canal/wiki/AdminGuide

  完整配置和参数解释如下:

  1. [root@aliyun conf]# cd example/
  2. [root@aliyun example]# cat instance.properties
  3. #################################################
  4. ## mysql serverId , v1.0.26+ will autoGen
  5. # canal.instance.mysql.slaveId=
  6.  
  7. # enable gtid use true/false
  8. canal.instance.gtidon=false                              #启用GTID,默认不启用
  9.  
  10. # position info                                     #同步的位置点信息,包括binlog和偏移量等
  11. canal.instance.master.address=
  12. canal.instance.master.journal.name=
  13. canal.instance.master.position=
  14. canal.instance.master.timestamp=
  15. canal.instance.master.gtid=
  16.  
  17. # rds oss binlog                                    #rds oss的同步
  18. canal.instance.rds.accesskey=
  19. canal.instance.rds.secretkey=
  20. canal.instance.rds.instanceId=
  21.  
  22. # table meta tsdb info
  23. canal.instance.tsdb.enable=true                          #启用tsdb,他应该是记录数据用的
  24. #canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
  25. #canal.instance.tsdb.dbUsername=canal
  26. #canal.instance.tsdb.dbPassword=canal
  27.  
  28. #canal.instance.standby.address =                        #这几个参数是设置高可用配置的,可以配置mysql从库的信息
  29. #canal.instance.standby.journal.name =
  30. #canal.instance.standby.position =
  31. #canal.instance.standby.timestamp =
  32. #canal.instance.standby.gtid=
  33.  
  34. # username/password                                #用户名密码
  35. canal.instance.dbUsername=canal
  36. canal.instance.dbPassword=canal
  37. canal.instance.connectionCharset = UTF-
  38. # enable druid Decrypt database password
  39. canal.instance.enableDruid=false
  40. #canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==
  41.  
  42. # table regex                                    #过滤规则
  43. canal.instance.filter.regex=.*\\..*
  44. # table black regex
  45. canal.instance.filter.black.regex=
  46.  
  47. # mq config                                    #MQ的相关配置,本篇不涉及
  48. canal.mq.topic=example
  49. # dynamic topic route by schema or table regex
  50. #canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
  51. canal.mq.partition=
  52. # hash partition config
  53. #canal.mq.partitionsNum=
  54. #canal.mq.partitionHash=test.table:id^name,.*\\..*
  55. #################################################

  当你配置好你所需要的所有实例之后,就可以启动canal-server了

  1. [root@aliyun example]# cd /usr/local/canal-server/bin/
  2. [root@aliyun bin]# sh startup.sh                        #启动命令
  3. [root@aliyun bin]# tailf ../logs/canal/canal.log              #查看日志信息
  4. OpenJDK -Bit Server VM warning: ignoring option PermSize=96m; support was removed in 8.0
  5. OpenJDK -Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
  6. OpenJDK -Bit Server VM warning: UseCMSCompactAtFullCollection is deprecated and will likely be removed in a future release.
  7. OpenJDK -Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
  8. OpenJDK -Bit Server VM warning: INFO: os::commit_memory(, ) failed; error=)
  9. #
  10. # There is insufficient memory for the Java Runtime Environment to continue.
  11. # Native memory allocation (mmap) failed to map bytes for committing reserved memory.
  12. # An error report file with more information is saved as:
  13. # /usr/local/canal-server/bin/hs_err_pid2261.log

  不能分配内存,导致启动失败。把堆内存调小一点。

  1. [root@aliyun bin]# vim startup.sh #把启动设置的内存改小点,最后结果可以像我这样,注意,测试才这样用的,实际使用时注意给大点内存,多大自己把握。[root@aliyun bin]# grep 512m startup.sh
  2. JAVA_OPTS="-server -Xms512m -Xmx512m -Xmn512m -XX:SurvivorRatio=2 -XX:PermSize=96m -XX:MaxPermSize=256m -Xss256k -XX:-UseAdaptiveSizePolicy -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError"

  重新启动:

  1. [root@aliyun bin]# sh restart.sh ;tailf ../logs/canal/canal.log
  2. -- ::14.107 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## set default uncaught exception handler
  3. -- ::14.170 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## load canal configurations
  4. -- ::14.203 [main] INFO c.a.o.c.d.monitor.remote.RemoteConfigLoaderFactory - ## load local canal configurations
  5. -- ::14.210 [main] INFO com.alibaba.otter.canal.deployer.CanalStater - ## start the canal server.
  6. -- ::]
  7. -- ::15.307 [main] WARN o.s.beans.GenericTypeAwarePropertyDescriptor - Invalid JavaBean property 'connectionCharset' being accessed! Ambiguous write methods found next to actually used [public void com.alibaba.otter.canal.parse.inbound.mysql.AbstractMysqlEventParser.setConnectionCharset(java.nio.charset.Charset)]: [public void com.alibaba.otter.canal.parse.inbound.mysql.AbstractMysqlEventParser.setConnectionCharset(java.lang.String)]
  8. -- ::15.800 [main] ERROR com.alibaba.druid.pool.DruidDataSource - testWhileIdle is true, validationQuery not set
  9. -- ::16.230 [main] WARN c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table filter : ^.*\..*$
  10. -- ::16.230 [main] WARN c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table black filter :
  11. -- ::16.476 [main] INFO com.alibaba.otter.canal.deployer.CanalStater - ## the canal server is running now ......
  12. -- :: , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position
  13. -- :: , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - prepare to find start position just show master status
  14. -- ::] ERROR com.alibaba.druid.pool.DruidDataSource - testWhileIdle is true, validationQuery not set
  15. -- ::] WARN c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table filter : ^.*\..*$
  16. -- ::] WARN c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table black filter :
  17. -- ::] INFO c.a.o.canal.deployer.monitor.SpringInstanceConfigMonitor - auto notify start user_order successful.
  18. -- :: , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position
  19. -- :: , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - prepare to find start position just show master status
  20. -- ::] ERROR com.alibaba.druid.pool.DruidDataSource - testWhileIdle is true, validationQuery not set
  21. -- ::] WARN c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table filter : ^.*\..*$
  22. -- ::] WARN c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table black filter :
  23. -- ::] INFO c.a.o.canal.deployer.monitor.SpringInstanceConfigMonitor - auto notify start delivery_info successful.
  24. -- :: , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position
  25. -- :: , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - prepare to find start position just show master status
  26. -- :: , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> ,position=,serverId=,gtid=<] cost : 533ms , the next step is binlog dump
  27. -- :: , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> ,position=,serverId=,gtid=<] cost : 1365ms , the next step is binlog dump
  28. -- :: , EventParser] WARN c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> ,position=,serverId=,gtid=<] cost : 1087ms , the next step is binlog dump

  启动成功。至此呢canal-server端就启动完成了。下面配置canal-adapter适配器,同步数据到ES。

3、配置canal-adapter

  下载地址:https://github.com/alibaba/canal/releases 

  1. [root@aliyun ~]# wget https://github.com/alibaba/canal/releases/download/canal-1.1.3/canal.adapter-1.1.3.tar.gz

  解压:

  1. [root@aliyun ~]# mkdir /usr/local/canal-adapter
  2. [root@aliyun ~]# .tar.gz -C /usr/local/canal-adapter/
  3. [root@aliyun ~]# ll /usr/local/canal-adapter/
  4. total
  5. drwxr-xr-x root root Jul : bin
  6. drwxrwxrwx root root Apr : conf
  7. drwxr-xr-x root root Jul : lib
  8. drwxrwxrwx root root Apr : logs
  9. drwxrwxrwx root root Apr : plugin

  配置文件介绍:

  1. [root@aliyun ~]# cd /usr/local/canal-adapter/conf/
  2. [root@aliyun conf]# ll
  3. total
  4. -rwxrwxrwx root root Apr : application.yml    #adapter的主配置文件
  5. -rwxrwxrwx root root Apr : bootstrap.yml      #启动引导文件
  6. drwxr-xr-x root root Jul : es            #和es对接的配置文件目录,
  7. drwxr-xr-x root root Jul : hbase          #和hbase对接的配置文件目录
  8. -rwxrwxrwx root root Apr : logback.xml      #日志配置
  9. drwxrwxrwx root root Jul : META-INF
  10. drwxrwxrwx root root Apr : rdb

  主配置文件我们需要重点关心的参数如下:

  1. [root@aliyun conf]# grep -v "^#" application.yml
  2. server:
  3. port:
  4. spring:
  5. jackson:
  6. date-format: yyyy-MM-dd HH:mm:ss
  7.  
  8. default-property-inclusion: non_null
  9.  
  10. canal.conf:
  11. mode: tcp # kafka rocketMQ              #模式
  12. canalServerHost: 127.0.0.1:11111           #指定canal-server的地址和端口
  13. batchSize:
  14. syncBatchSize:
  15. retries:
  16. timeout:
  17. accessKey:
  18. secretKey:

  srcDataSources:                    #数据源配置,从哪里获取数据
    defaultDS:                      #指定一个名字,在ES的配置中会用到,唯一
      url: jdbc:mysql://127.0.0.1:3306/mytest?useUnicode=true    #连接的数据库地址和一个库
      username: root                             #数据库的用户和密码
      password: 121212

  1. canalAdapters:                    #适配器配置
  2. - instance: example # canal instance Name or mq topic name    #指定在canal-server配置的实例
  3. groups:
  4. - groupId: g1                              #默认就好,组标示
  5. outerAdapters:                            #输出
  6. - name: es                              #输出到哪里?指定es   hosts: 127.0.0.1:9300                       #指定es的地址,注意端口为es的传输端口9300   properties:                     cluster.name: est                         #指定es的集群名称

  然后配置es的部分:

  1. [root@aliyun es]# pwd
  2. /usr/local/canal-adapter/conf/es
  3. [root@aliyun es]# ll
  4. total
  5. -rwxrwxrwx root root Apr : biz_order.yml          #这三个配置文件是自带的,可以删除,不过最好不要删除,因为可以参考他的格式
  6. -rwxrwxrwx root root Apr : customer.yml
  7. -rwxrwxrwx root root Apr : mytest_user.yml

  这里我们先配置一个简单的单表映射到es,后面在配置一个复杂点的父子文档映射(一父两子)。

  单表映射配置文件取名为test.yml,要同步的数据库表结构如下:

  1. mysql> desc order_item;
  2. +----------------------+--------------+------+-----+-------------------+----------------+
  3. | Field | Type | Null | Key | Default | Extra |
  4. +----------------------+--------------+------+-----+-------------------+----------------+
  5. | ) | NO | PRI | NULL | auto_increment |
  6. | biz_code | varchar() | NO | | NULL | |
  7. | user_id | bigint() | NO | MUL | NULL | |
  8. | user_name | varchar() | YES | | | |
  9. | order_id | bigint() | NO | MUL | NULL | |
  10. | payment_amount | bigint() | YES | | | |
  11. | item_id | bigint() | NO | MUL | NULL | |
  12. | discount_amount | bigint() | NO | | | |
  13. | point | bigint() | YES | | | |
  14. | refund_amount | bigint() | YES | | | |
  15. | point_amount | bigint() | NO | | | |
  16. | refund_reason_id | tinyint() | YES | | NULL | |
  17. | refund_status | tinyint() | NO | MUL | | |
  18. | refund_type | tinyint() | YES | MUL | NULL | |
  19. | refund_batch_no | varchar() | YES | | NULL | |
  20. | refund_time | datetime | YES | MUL | NULL | |
  21. | refund_response_time | datetime | YES | | NULL | |
  22. | item_sku_id | bigint() | NO | | NULL | |
  23. | item_sku_desc | varchar() | NO | | NULL | |
  24. | item_name | varchar() | NO | | NULL | |
  25. | delivery_mark | ) | YES | | | |
  26. | item_type | tinyint() | YES | | NULL | |
  27. | original_sku_id | bigint() | YES | | NULL | |
  28. | item_image_url | varchar() | NO | | NULL | |
  29. | unit_price | bigint() | NO | | NULL | |
  30. | category_id | bigint() | YES | | NULL | |
  31. | item_brand_id | bigint() | YES | | NULL | |
  32. | number | ) | NO | | NULL | |
  33. | delivery_type | tinyint() | NO | | NULL | |
  34. | delivery_info_id | bigint() | YES | | NULL | |
  35. | activity_id | bigint() | YES | | NULL | |
  36. | seller_id | bigint() | NO | MUL | NULL | |
  37. | higo_mark | tinyint() | YES | | | |
  38. | higo_extra_info | varchar() | YES | | NULL | |
  39. | virtual_mark | bigint() | YES | | | |
  40. | supplier_id | bigint() | YES | | | |
  41. | delivery_print_mark | tinyint() | YES | | | |
  42. | print_info_id | bigint() | YES | | | |
  43. | cost_price | bigint() | YES | | NULL | |
  44. | bar_code | varchar() | YES | | NULL | |
  45. | delivery_fee | bigint() | YES | | NULL | |
  46. | tax_fee | bigint() | YES | | NULL | |
  47. | real_point_amount | bigint() | YES | | NULL | |
  48. | real_discount_amount | bigint() | YES | | NULL | |
  49. | real_payment_amount | bigint() | YES | | NULL | |
  50. | supplier_biz_code | varchar() | NO | | | |
  51. | supplier_seller_id | bigint() | NO | | | |
  52. | supplier_item_id | bigint() | NO | | | |
  53. | supplier_sku_id | bigint() | NO | | | |
  54. | parent_order_id | bigint() | NO | | | |
  55. | league_amount | bigint() | NO | | | |
  56. | supplier_amount | bigint() | NO | | | |
  57. | proxy_mark | tinyint() | NO | | | |
  58. | proxy_profit | bigint() | NO | | | |
  59. | delete_mark | tinyint() | NO | | | |
  60. | delete_timestamp | bigint() | YES | | | |
  61. | gmt_created | datetime | NO | | CURRENT_TIMESTAMP | |
  62. | gmt_modified | datetime | NO | | CURRENT_TIMESTAMP | |
  63. | goods_status | tinyint() | YES | | | |
  64. | video_id | bigint() | YES | | NULL | |
  65. | star_bonus_fee | bigint() | YES | | NULL | |
  66. | service_fee | bigint() | YES | | NULL | |
  67. | limit_type | tinyint() | NO | | | |
  68. +----------------------+--------------+------+-----+-------------------+----------------+
  69. rows in set (0.00 sec)

  对应的映射[root@aliyun ~]# cat test.yml

  1. dataSourceKey: defaultDS               #指定数据源,这个值和adapter的application.yml文件中配置的srcDataSources值对应。
  2. destination: example                 #指定canal-server中配置的某个实例的名字,注意:我们可能配置多个实例,你要清楚的知道每个实例收集的是那些数据,不要瞎搞。
  3. groupId: g1                       #组ID,默认就好
  4. esMapping:                       #ES的mapping(映射)
  5. _index: user_order                 #要同步到的ES的索引名称(自定义),需要自己在ES上创建哦!
  6. _type: user_order             # ES索引的类型名称(自定义)
  7. _id: _id                       #ES标示文档的唯一标示,通常对应数据表中的主键ID字段,注意我这里写成的是"_id",有个下划线哦!
  8. sql: "select concat('item_',t.id) as _id,    #这里就是数据表中的每个字段到ES索引中叫什么名字的sql映射,注意映射到es中的每个字段都要是唯一的,不能重复。
  9. t.biz_code as item_biz_code,        #可以全表映射到ES中,也可以部分字段映射到ES中,只要把不要的字段不写在这个sql映射中就OK啦
  10. t.user_id as item_user_id,          # t.user_id 是数据表中的字段,当他到es中后叫做item_user_name,as后面是自定义的名字。
  11. t.user_name as item_user_name,
  12. t.order_id,                   #当然也可以不取别名,像这样,到ES中后就叫order_id
  13. t.id as item_primary_id,
  14. t.payment_amount as item_payment_amount,
  15. t.item_id as item_id,
  16. t.discount_amount as item_discount_amount,
  17. t.refund_amount as item_refund_amount,
  18. t.refund_reason_id as item_refund_reason_id,
  19. t.order_id as item_order_id,
  20. t.refund_status as item_refund_status,
  21. t.refund_type as item_refund_type,
  22. t.refund_batch_no as item_refund_batch_no,
  23. t.refund_time as item_refund_time,
  24. t.refund_response_time as item_refund_response_time,
  25. t.item_sku_id as item_sku_id,
  26. t.item_sku_desc as item_sku_desc,
  27. t.item_name as item_item_name,
  28. t.delivery_mark as item_delivery_mark,
  29. t.item_type as item_type,
  30. t.item_image_url as item_image_url,
  31. t.unit_price as item_unit_price,
  32. t.category_id as item_category_id,
  33. t.number as item_number,
  34. t.delivery_type as item_delivery_type,
  35. t.delivery_info_id as item_delivery_info_id,
  36. t.activity_id as item_activity_id,
  37. t.seller_id as item_seller_id,
  38. t.cost_price as item_cost_price,
  39. t.bar_code as item_bar_code,
  40. t.delivery_fee as item_delivery_fee,
  41. t.parent_order_id as item_parent_order_id,
  42. t.proxy_mark as item_proxy_mark,
  43. t.proxy_profit as item_proxy_profit,
  44. t.delete_mark as item_delete_mark,
  45. t.delete_timestamp as item_delete_timestamp,
  46. t.gmt_created as item_gmt_created,
  47. t.gmt_modified as item_gmt_modified,
  48. t.star_bonus_fee as item_star_bonus_fee,
  49. t.service_fee as item_service_fee,
  50. t.limit_type as item_limit_type from order_item t"
  51. etlCondition: "where t.c_time>='{0}'"              #ETL同步的条件,ETL是什么自行百度
  52. commitBatch: 3000                          #每次同步提交的大小

   sql映射文件写完之后,要去ES上面创建对应的索引和映射,映射要求要和sql文件的映射保持一致,即sql映射中有的字段在ES的索引映射中必须要有,否则同步会报字段错误,导致失败。

  上面对应的索引映射如下:

  可以使用head插件创建索引映射:如图:

  

  1. {
  2. "mappings": {
  3. "user_order": {
  4. "properties": {
  5. "item_user_id": {
  6. "type": "long"
  7. },
  8. "item_user_name": {
  9. "type": "text",
  10. "analyzer": "ik_max_word",
  11. "search_analyzer": "ik_smart"
  12. },
  13. "item_biz_code": {
  14. "type": "text"
  15. },
  16. "order_user_name": {
  17. "type": "text"
  18. },
  19. "item_payment_amount": {
  20. "type": "long"
  21. },
  22. "item_id": {
  23. "type": "long"
  24. },
  25. "item_discount_amount": {
  26. "type": "long"
  27. },
  28. "item_refund_amount": {
  29. "type": "long"
  30. },
  31. "item_refund_reason_id": {
  32. "type": "long"
  33. },
  34. "item_refund_status": {
  35. "type": "long"
  36. },
  37. "item_refund_type": {
  38. "type": "long"
  39. },
  40. "item_refund_batch_no": {
  41. "type": "text"
  42. },
  43. "item_refund_time": {
  44. "type": "date"
  45. },
  46. "item_refund_response_time": {
  47. "type": "date"
  48. },
  49. "item_sku_id": {
  50. "type": "long"
  51. },
  52. "item_sku_desc": {
  53. "type": "text"
  54. },
  55. "item_item_name": {
  56. "type": "text"
  57. },
  58. "item_delivery_mark": {
  59. "type": "long"
  60. },
  61. "item_type": {
  62. "type": "long"
  63. },
  64. "item_image_url": {
  65. "type": "text"
  66. },
  67. "item_unit_price": {
  68. "type": "long"
  69. },
  70. "item_category_id": {
  71. "type": "long"
  72. },
  73. "item_number": {
  74. "type": "long"
  75. },
  76. "item_delivery_type": {
  77. "type": "long"
  78. },
  79. "item_delivery_info_id": {
  80. "type": "long"
  81. },
  82. "item_activity_id": {
  83. "type": "long"
  84. },
  85. "item_seller_id": {
  86. "type": "long"
  87. },
  88. "item_cost_price": {
  89. "type": "long"
  90. },
  91. "item_bar_code": {
  92. "type": "text"
  93. },
  94. "item_order_id": {
  95. "type": "long"
  96. },
  97. "item_delivery_fee": {
  98. "type": "long"
  99. },
  100. "item_parent_order_id": {
  101. "type": "long"
  102. },
  103. "item_proxy_mark": {
  104. "type": "long"
  105. },
  106. "item_proxy_profit": {
  107. "type": "long"
  108. },
  109. "item_delete_mark": {
  110. "type": "long"
  111. },
  112. "item_delete_timestamp": {
  113. "type": "long"
  114. },
  115. "item_gmt_created": {
  116. "type": "date"
  117. },
  118. "item_gmt_modified": {
  119. "type": "date"
  120. },
  121. "item_star_bonus_fee": {
  122. "type": "long"
  123. },
  124. "item_service_fee": {
  125. "type": "long"
  126. },
  127. "item_limit_type": {
  128. "type": "long"
  129. }
  130. }
  131. }
  132. }
  133. }

  可能我的sql映射和es的索引映射可能会有字段丢失,因为太多了大晚上的我也不想一个一个对照了,反正方法就是这样的。完成之后,启动canal-adapter,开始同步数据。

启动canal-adapter:

  1. [root@aliyun bin]# pwd
  2. /usr/local/canal-adapter/bin
  3. [root@aliyun bin]# ./startup.sh

查看日志:

  1. - ::41.608 [main] INFO c.a.o.canal.adapter.launcher.loader.CanalAdapterService - ## the canal client adapters are running now ......
  2. -- ::41.617 [main] INFO org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-8081"]
  3. -- ::41.625 [main] INFO org.apache.tomcat.util.net.NioSelectorPool - Using a shared selector for servlet write/read
  4. -- ::] INFO c.a.o.canal.adapter.launcher.loader.CanalAdapterWorker - =============> Start to connect destination: example <=============
  5. -- :: (http) with context path ''
  6. -- ::41.797 [main] INFO c.a.otter.canal.adapter.launcher.CanalAdapterApplication - Started CanalAdapterApplication in 8.123 seconds (JVM running for 9.181)
  7. -- ::] INFO c.a.o.canal.adapter.launcher.loader.CanalAdapterWorker - =============> Start to subscribe destination: example <=============
  8. -- ::] INFO c.a.o.canal.adapter.launcher.loader.CanalAdapterWorker - =============> Subscribe destination: example succeed <=============

配置好通路之后,先做一次全量数据同步:

  1. curl http://127.0.0.1:8081/etl/es/test.yml -X POST

现在可以在数据库中插入一条或者更改一条数据,查看日志看es中的数据是否同步,如果启动过程没有什么错误的话就成功啦,不过不可能那么顺利的,遇到问题解决问题才是进步成长的过程不要放弃!

鉴于篇幅原因,父子文档下篇在写。

canal同步MySQL数据到ES6.X的更多相关文章

  1. 使用Logstash来实时同步MySQL数据到ES

    上篇讲到了ES和Head插件的环境搭建和配置,也简单模拟了数据作测试 本篇我们来实战从MYSQL里直接同步数据 一.首先下载和你的ES对应的logstash版本,本篇我们使用的都是6.1.1 下载后使 ...

  2. 使用logstash同步MySQL数据到ES

    使用logstash同步MySQL数据到ES 版权声明:[分享也是一种提高]个人转载请在正文开头明显位置注明出处,未经作者同意禁止企业/组织转载,禁止私自更改原文,禁止用于商业目的. https:// ...

  3. Logstash使用jdbc_input同步Mysql数据时遇到的空时间SQLException问题

    今天在使用Logstash的jdbc_input插件同步Mysql数据时,本来应该能搜索出10条数据,结果在Elasticsearch中只看到了4条,终端中只给出了如下信息 [2017-08-25T1 ...

  4. 推荐一个同步Mysql数据到Elasticsearch的工具

    把Mysql的数据同步到Elasticsearch是个很常见的需求,但在Github里找到的同步工具用起来或多或少都有些别扭. 例如:某记录内容为"aaa|bbb|ccc",将其按 ...

  5. centos7配置Logstash同步Mysql数据到Elasticsearch

    Logstash 是开源的服务器端数据处理管道,能够同时从多个来源采集数据,转换数据,然后将数据发送到您最喜欢的“存储库”中.个人认为这款插件是比较稳定,容易配置的使用Logstash之前,我们得明确 ...

  6. Canal——增量同步MySQL数据到ElasticSearch

    1.准备 1.1.组件 JDK:1.8版本及以上: ElasticSearch:6.x版本,目前貌似不支持7.x版本:       Kibana:6.x版本:     Canal.deployer:1 ...

  7. Elasticsearch--Logstash定时同步MySQL数据到Elasticsearch

    新地址体验:http://www.zhouhong.icu/post/139 一.Logstash介绍 Logstash是elastic技术栈中的一个技术.它是一个数据采集引擎,可以从数据库采集数据到 ...

  8. 快速同步mysql数据到redis中

    MYSQL快速同步数据到Redis 举例场景:存储游戏玩家的任务数据,游戏服务器启动时将mysql中玩家的数据同步到redis中. 从MySQL中将数据导入到Redis的Hash结构中.当然,最直接的 ...

  9. logstash同步mysql数据失败

      问题描述 前提: 项目采用Elasticsearch提供搜索服务,Mysql提供存储服务,通过Logstash将Mysql中数据同步到Elasticsearch. 问题: 使用logstash-j ...

随机推荐

  1. SpringBoot(17)---SpringBoot整合RocketMQ

    SpringBoot整合RocketMQ 上篇博客讲解了服务器集群部署RocketMQ 博客地址:RocketMQ(2)---Docker部署RocketMQ集群 这篇在上篇搭建好的基础上,将Spri ...

  2. Java系统架构师学习体系图

  3. MAC电脑修改Terminal以及vim高亮显示

    1. Terminal高亮显示 编辑~/.bash_profile文件,在末尾增加两行: export CLICOLOR= export LSCOLORS=exfxcxdxcxegedabagacad ...

  4. spring boot 2.x 系列 —— spring boot 整合 dubbo

    文章目录 一. 项目结构说明 二.关键依赖 三.公共模块(boot-dubbo-common) 四. 服务提供者(boot-dubbo-provider) 4.1 提供方配置 4.2 使用注解@Ser ...

  5. SSM(五)Mybatis配置缓存

    1.在没有配置的情况下,mybatis默认开启一级缓存. Object object=mapper.getXxx(object); Object object2=mapper.getXxx(objec ...

  6. PATB 1038. 统计同成绩学生(20)

    https://www.patest.cn/contests/pat-b-practise/1038 #include <cstdio> int cnt[110]; int temp[10 ...

  7. 查询IP地址的免费API

    1.百度 1.http://sp0.baidu.com/8aQDcjqpAAV3otqbppnN2DJv/api.php?query=192.168.0.0&co=&resource_ ...

  8. VB非常见知识总结

    1.VB.Net设置Excel中单元格字体 sheet.Range(sheet.Cells(row, stp), sheet.Cells(row, stp)).Font.Name = "Wi ...

  9. 快速掌握mongoDB(二)——聚合管道和MapReduce

    上一节简单介绍了一下mongoDB的增删改查操作,这一节将介绍其聚合操作.我们在使用mysql.sqlserver时经常会用到一些聚合函数,如sum/avg/max/min/count等,mongoD ...

  10. MyBatis 接口多参数的处理方法

    From<MyBatis从入门到精通> 1.接口类中增加的方法: /* 2.7 多个接口参数的用法 多个参数时,可以选取的方案有:使用Map类型或者使用@Param注解 使用Map类型作为 ...