<p></p>

Solution for automatic update of Chinese word segmentation full-text index in NEO4J

Failed to implement automatic updates using the NEO4J INDEX API, converting a way of thinking to solve this problem (synchronizing updates to the corresponding full-text index when updating a node or creating a new one.)

1. Sample data

Sample Data Format Reference

2. Differences between English and Chinese Full-Text Indexes

1. Create NEO4J default index

CALL apoc.index.addAllNodes('Loc', {Loc:["description","cause","year"]})
// The following retrieval was unsuccessful:
CALL apoc.index.search('Loc', 'Loc.description:Chinese~') YIELD node RETURN node
CALL apoc.index.search('Loc', 'Loc.description:Chinese*') YIELD node RETURN node
CALL apoc.index.search('Loc', 'Loc.description:test~') YIELD node RETURN node
CALL apoc.index.search('Loc', 'Loc.description:Test Chinese~') YIELD node RETURN node

2. Delete Index

CALL apoc.index.remove('Loc')

3. Create an index that supports Chinese words

CALL zdr.index.addChineseFulltextIndex('Loc', ["description","cause","year"], 'Loc') YIELD message RETURN message
// The following retrieval was successful:
CALL apoc.index.search('Loc', 'description:Chinese~') YIELD node RETURN node
CALL apoc.index.search('Loc', 'description:Chinese*') YIELD node RETURN node
CALL apoc.index.search('Loc', 'description:test~') YIELD node RETURN node
CALL apoc.index.search('Loc', 'description:Test Chinese~') YIELD node RETURN node

3. APOC has its own English full-text indexing process (indexing can be updated automatically)

1. Add Full-Text Index

CALL apoc.index.addAllNodes('Loc', {Loc:["description","cause","year"]},{autoUpdate:true})

2. New Nodes and Attributes

CREATE (n:Loc {name:'V'})  SET n.description='Testing Chinese word segmentation, the final chapter of the duplicate show was very exciting. It is said that knowledge mapping and artificial intelligence technology were applied to that movie!',n.cause='Test the English word breaker, Mobile World Congress, the world's largest gathering for the mobile industry, ' RETURN n

3. Retrieval

Indexes can be updated automatically, but they are not friendly to Chinese retrieval, such as the following tests:

// Retrieval failed:
CALL apoc.index.search('Loc', 'Loc.cause:Test English word breakers~') YIELD node RETURN node
CALL apoc.index.search('Loc', 'Loc.description:Test Chinese word segmentation~') YIELD node RETURN node
// Retrieved successfully:
CALL apoc.index.search('Loc', 'Loc.cause:Test English word breakers*') YIELD node RETURN node
CALL apoc.index.search('Loc', 'Loc.description:Test Chinese word segmentation*') YIELD node RETURN node

4. Custom Chinese word segmentation full-text index plug-in (unsuccessful automatic index update)

The addChineseFulltextAutoIndex process succeeds in creating a full-text index to add a full-text indexing process that supports Chinese, but automatic updates are not supported for updating new attributes of nodes.

1. Add Full-Text Index

CALL zdr.index.addChineseFulltextAutoIndex('IKAnalyzer',["description","cause","year"],'Loc',{autoUpdate:'true'}) YIELD message RETURN message

2. New Nodes and Attributes

CREATE (n:Loc {name:'V'})  SET n.description='Testing Chinese word segmentation, the final chapter of the duplicate show was very exciting. It is said that knowledge mapping and artificial intelligence technology were applied to that movie!',n.cause='Test the English word breaker, Mobile World Congress, the world's largest gathering for the mobile industry, ' RETURN n

3. Retrieval

After adding a full-text search, you can retrieve:

CALL zdr.index.chineseFulltextIndexSearch('IKAnalyzer', 'description:Acridyl Aminomethane Sulfonymethoxyaniline', 100) YIELD node RETURN node

Re-index before retrieving:

CALL zdr.index.chineseFulltextIndexSearch('IKAnalyzer', 'description:test~', 100) YIELD node RETURN node

V. Label Cross-search

Add ChineseFulltextAutoIndex/addChineseFulltextIndex supports multiple tags while retrieving, using the same index name when building the index.

Tag: Loc

CALL zdr.index.addChineseFulltextAutoIndex('Loc',["description","cause","name"],'Loc',{autoUpdate:'true'}) YIELD message RETURN message

Tag: LocProvince'

CALL zdr.index.addChineseFulltextAutoIndex('Loc',["description","cause","name"],'LocProvince',{autoUpdate:'true'}) YIELD message RETURN message

Retrieve node:

CALL apoc.index.search('Loc', 'name:p~') YIELD node RETURN node

6. Custom Chinese Word Segmentation Plugin (Failed to Update Indexes Independently of Nodes)

To support single-node index updates, develop the following process.(The automatic update scheme described in the third section fails, and updates to the corresponding full-text index synchronously when updating or creating a new node.)

1. Add Full-Text Index

CALL apoc.index.remove('Loc')
CALL zdr.index.addChineseFulltextIndex('Loc',["description","cause","year"],'Loc') YIELD message RETURN message

2. Add Nodes and Attributes and Update Full-Text Index

CREATE (n:Loc {name:'V'})  SET n.description='Testing Chinese word segmentation, the final chapter of the duplicate show was very exciting. It is said that knowledge mapping and artificial intelligence technology were applied to that movie!',n.cause='Test the English word breaker, Mobile World Congress, the world's largest gathering for the mobile industry, ' RETURN n

3. Add 2 new nodes or updated attributes to the index

MATCH (n) WHERE n.name='V' WITH n CALL zdr.index.addNodeChineseFulltextIndex(n, ['description']) RETURN *

4. Retrieval

CALL zdr.index.chineseFulltextIndexSearch('Loc', 'description:Test Chinese~') YIELD node RETURN node

7. Resolve Transaction Submission Timeout

If the transaction commit timeout setting is configured, Cancel when building the index.

#********************************************************************
### Neo4j transcation timeout
###******************************************************************
#dbms.transaction.timeout=180s

Use a background script to execute the indexer:

# index.sh
#!/usr/bin/env bash
nohup /neo4j-community-3.4.9/bin/neo4j-shell -file build.cql >>indexGraph.log 2>&1 &
// build.cql
CALL zdr.index.addChineseFulltextIndex('IKAnalyzer', ['description','fullname','name','lnkurl'], 'LinkedinID') YIELD message RETURN message;

All of the above references to the NEO4J custom process

原文地址:https://programmer.ink/think/5cd0160be03d2.html

Solution for automatic update of Chinese word segmentation full-text index in NEO4J的更多相关文章

  1. 长短时间记忆的中文分词 (LSTM for Chinese Word Segmentation)

    翻译学长的一片论文:Long Short-Term Memory Neural Networks for Chinese Word Segmentation 传统的neural Model for C ...

  2. zpar使用方法之Chinese Word Segmentation

    第一步在这里: http://people.sutd.edu.sg/~yue_zhang/doc/doc/qs.html 你可以找到这句话, 所以在命令行中分别敲入 make zpar make zp ...

  3. 论文阅读及复现 | Effective Neural Solution for Multi-Criteria Word Segmentation

    主要思想 这篇文章主要是利用多个标准进行中文分词,和之前复旦的那篇文章比,它的方法更简洁,不需要复杂的结构,但比之前的方法更有效. 方法 堆叠的LSTM,最上层是CRF. 最底层是字符集的Bi-LST ...

  4. The solution for apt-get update Err 404

    最近在ubuntu 12.10上执行sudo apt-get update 命令后出现了如下错误: Ign http://extras.ubuntu.com natty/main Translatio ...

  5. Chinese word segment based on character representation learning 论文笔记

    论文名和编号 摘要/引言 相关背景和工作 论文方法/模型 实验(数据集)及 分析(一些具体数据) 未来工作/不足 是否有源码 问题 原因 解决思路 优势 基于表示学习的中文分词 编号:1001-908 ...

  6. LIST OF NOSQL DATABASES [currently 150]

    http://nosql-database.org Core NoSQL Systems: [Mostly originated out of a Web 2.0 need] Wide Column ...

  7. Pyhton开源框架(加强版)

    info:Djangourl:https://www.oschina.net/p/djangodetail: Django 是 Python 编程语言驱动的一个开源模型-视图-控制器(MVC)风格的 ...

  8. Python开源框架

    info:更多Django信息url:https://www.oschina.net/p/djangodetail: Django 是 Python 编程语言驱动的一个开源模型-视图-控制器(MVC) ...

  9. 【DeepLearning】一些资料

    记录下,有空研究. http://nlp.stanford.edu/projects/DeepLearningInNaturalLanguageProcessing.shtml http://nlp. ...

随机推荐

  1. python学习之路(10)--难点

    递归函数 在函数内部,可以调用其他函数.如果一个函数在内部调用自身本身,这个函数就是递归函数. 举个例子,我们来计算阶乘n! = 1 x 2 x 3 x ... x n,用函数fact(n)表示,可以 ...

  2. JavaScript疑难杂症系列-事件

    事件这块知识点虽然是老生长谈的,但对于我来说多多整理,多多感悟,温故知新,每次看看这块都有不同的收获.(在这里我不会长篇大论,只会挑重点;具体的小伙伴们自行查找) 什么是事件 在编程时系统内发生的动作 ...

  3. 程序代码运行结果是(abdcbdcb)

    public class Test { public static boolean show(char ch) { System.out.print(ch); return true; } publi ...

  4. IDEA设置注释模板

    特别提示:本人博客部分有参考网络其他博客,但均是本人亲手编写过并验证通过.如发现博客有错误,请及时提出以免误导其他人,谢谢!欢迎转载,但记得标明文章出处:http://www.cnblogs.com/ ...

  5. wls12C启用Gzip

    https://community.oracle.com/message/14109820 https://docs.oracle.com/middleware/1221/wls/NOTES/what ...

  6. awk命令2

    提取文件后四行 注释:NR==FNR表示第一个文件,执行{a++},计算出第一个文件10的行数,NR!=FNR表示第二个文件10,执行{if(FNR<=a-4){print $0}},打印出第二 ...

  7. 五一 DAY 7

    五一  DAY 7 P1514 引水入城 P1311 选择客栈 题解: P1315 观光公交 题解: 设 wait i 为最晚到达的旅客 arrive i 为到达i 的时刻 arrive i =max ...

  8. 【转】C++友元

    转自:https://www.cnblogs.com/BeyondAnyTime/archive/2012/06/04/2535305.html 1.友元函数的简单介绍 1.1为什么要使用友元函数 在 ...

  9. 浏览器端-W3School-HTML:HTML DOM Object 对象

    ylbtech-浏览器端-W3School-HTML:HTML DOM Object 对象 1.返回顶部 1. HTML DOM Object 对象 Object 对象 Object 对象代表 HTM ...

  10. 浏览器端-W3School-JavaScript-HTML DOM:HTML DOM Attribute 对象

      ylbtech-浏览器端-W3School-JavaScript-HTML DOM:HTML DOM Attribute 对象 1.返回顶部 1. HTML DOM Attribute 对象 HT ...