第2篇里,介绍了jenaThe general purpose rule engine(通用规则引擎)及其使用,本篇继续探究,如何自定义builtin。

builtin介绍

先回顾builtin为何物,官方叫Builtin primitives,可以理解为内置函数、内置指令,可以返回true或者false用来检验rule是否匹配,官方包含如下的primitives

Builtin Operations

isLiteral(?x) notLiteral(?x)

isFunctor(?x) notFunctor(?x)

isBNode(?x) notBNode(?x)

Test whether the single argument is or is not a literal, a functor-valued
literal or a blank-node, respectively.
bound(?x...) unbound(?x..)
Test if all of the arguments are bound (not bound) variables
equal(?x,?y) notEqual(?x,?y)
Test if x=y (or x != y). The equality test is semantic equality so that,
for example, the xsd:int 1 and the xsd:decimal 1 would test equal.

lessThan(?x, ?y), greaterThan(?x, ?y)

le(?x, ?y), ge(?x, ?y)

Test if x is <, >, <= or >= y. Only passes if both x and y
are numbers or time instants (can be integer or floating point or XSDDateTime).

sum(?a, ?b, ?c)

addOne(?a, ?c)

difference(?a, ?b, ?c)

min(?a, ?b, ?c)

max(?a, ?b, ?c)

product(?a, ?b, ?c)

quotient(?a, ?b, ?c)

Sets c to be (a+b), (a+1) (a-b), min(a,b), max(a,b), (a
b), (a/b). Note that these do not run backwards, if in
 
sum
 a and c are bound and b is unbound then the test will
fail rather than bind b to (c-a). This could be fixed.

strConcat(?a1, .. ?an, ?t)

uriConcat(?a1, .. ?an, ?t)

Concatenates the lexical form of all the arguments except the last, then
binds the last argument to a plain literal (strConcat) or a URI node
(uriConcat) with that lexical form. In both cases if an argument node
is a URI node the URI will be used as the lexical form.

regex(?t, ?p)

regex(?t, ?p, ?m1, .. ?mn)

Matches the lexical form of a literal (?t) against a regular expression
pattern given by another literal (?p). If the match succeeds, and if
there are any additional arguments then it will bind the first n capture
groups to the arguments ?m1 to ?mn. The regular expression pattern syntax
is that provided by java.util.regex. Note that the capture groups are
numbered from 1 and the first capture group will be bound to ?m1, we
ignore the implicit capture group 0 which corresponds to the entire matched
string. So for example

regexp('foo bar', '(.) (.
)', ?m1, ?m2)

will bind
 
m1
 to
 
"foo"
 and
 
m2
 to
 
"bar".

now(?x)
Binds ?x to an xsd:dateTime value corresponding to the current time.
makeTemp(?x)
Binds ?x to a newly created blank node.
makeInstance(?x, ?p, ?v)

makeInstance(?x, ?p, ?t, ?v)
Binds ?v to be a blank node which is asserted as the value of the ?p property
on resource ?x and optionally has type ?t. Multiple calls with the same
arguments will return the same blank node each time - thus allowing this
call to be used in backward rules.
makeSkolem(?x, ?v1, ... ?vn)
Binds ?x to be a blank node. The blank node is generated based on the values
of the remain ?vi arguments, so the same combination of arguments will
generate the same bNode.
noValue(?x, ?p)

noValue(?x ?p ?v)
True if there is no known triple (x, p, ) or (x, p, v) in the model or
the explicit forward deductions so far.
remove(n, ...)

drop(n, ...)
Remove the statement (triple) which caused the n'th body term of this (forward-only)
rule to match. Remove will propagate the change to other consequent rules
including the firing rule (which must thus be guarded by some other clauses).
In particular, if the removed statement (triple) appears in the body
of a rule that has already fired, the consequences of such rule are retracted
from the deducted model. Drop will silently remove the triple(s) from
the graph but not fire any rules as a consequence. These are clearly
non-monotonic operations and, in particular, the behaviour of a rule
set in which different rules both drop and create the same triple(s)
is undefined.
isDType(?l, ?t) notDType(?l, ?t)
Tests if literal ?l is (or is not) an instance of the datatype defined
by resource ?t.
print(?x, ...)
Print (to standard out) a representation of each argument. This is useful
for debugging rather than serious IO work.
listContains(?l, ?x)
 

listNotContains(?l, ?x)
Passes if ?l is a list which contains (does not contain) the element ?x,
both arguments must be ground, can not be used as a generator.
listEntry(?list, ?index, ?val)
Binds ?val to the ?index'th entry in the RDF list ?list. If there is no
such entry the variable will be unbound and the call will fail. Only
usable in rule bodies.
listLength(?l, ?len)
Binds ?len to the length of the list ?l.
listEqual(?la, ?lb)
 

listNotEqual(?la, ?lb)
listEqual tests if the two arguments are both lists and contain the same
elements. The equality test is semantic equality on literals (sameValueAs)
but will not take into account owl:sameAs aliases. listNotEqual is the
negation of this (passes if listEqual fails).
listMapAsObject(?s, ?p ?l)
 

listMapAsSubject(?l, ?p, ?o)
These can only be used as actions in the head of a rule. They deduce a
set of triples derived from the list argument ?l : listMapAsObject asserts
triples (?s ?p ?x) for each ?x in the list ?l, listMapAsSubject asserts
triples (?x ?p ?o).
table(?p) tableAll()
Declare that all goals involving property ?p (or all goals) should be tabled
by the backward engine.
hide(p)
Declares that statements involving the predicate p should be hidden. Queries
to the model will not report such statements. This is useful to enable
non-monotonic forward rules to define flag predicates which are only
used for inference control and do not "pollute" the inference results.

builtin 自定义

自定义很简单,实现Builtin接口, 然后使用BuiltinRegistry.theRegistry.register注册即可。

Builtin接口定义如下:

public interface Builtin {

    /**
* Return a convenient name for this builtin, normally this will be the name of the
* functor that will be used to invoke it and will often be the final component of the
* URI.
*/
public String getName(); /**
* Return the full URI which identifies this built in.
*/
public String getURI(); /**
* Return the expected number of arguments for this functor or 0 if the number is flexible.
*/
public int getArgLength(); /**
* This method is invoked when the builtin is called in a rule body.
* @param args the array of argument values for the builtin, this is an array
* of Nodes, some of which may be Node_RuleVariables.
* @param length the length of the argument list, may be less than the length of the args array
* for some rule engines
* @param context an execution context giving access to other relevant data
* @return return true if the buildin predicate is deemed to have succeeded in
* the current environment
*/
public boolean bodyCall(Node[] args, int length, RuleContext context); /**
* This method is invoked when the builtin is called in a rule head.
* Such a use is only valid in a forward rule.
* @param args the array of argument values for the builtin, this is an array
* of Nodes.
* @param length the length of the argument list, may be less than the length of the args array
* for some rule engines
* @param context an execution context giving access to other relevant data
*/
public void headAction(Node[] args, int length, RuleContext context); /**
* Returns false if this builtin has side effects when run in a body clause,
* other than the binding of environment variables.
*/
public boolean isSafe(); /**
* Returns false if this builtin is non-monotonic. This includes non-monotonic checks like noValue
* and non-monotonic actions like remove/drop. A non-monotonic call in a head is assumed to
* be an action and makes the overall rule and ruleset non-monotonic.
* Most JenaRules are monotonic deductive closure rules in which this should be false.
*/
public boolean isMonotonic();
}

一般我们不用直接实现该接口,可以继承默认的实现BaseBuiltin, 一般只需要Override 下getName提供指令名称,实现bodyCall,提供函数调用即可。

    @Override
public String getName() {
return "semsim";
}

比如,我们来自定义一个指令,用来计算两两语义相似度:

public class SemanticSimilarityBuiltin extends BaseBuiltin {
/**
* Return a convenient name for this builtin, normally this will be the name of the
* functor that will be used to invoke it and will often be the final component of the
* URI.
*/
@Override
public String getName() {
return "semsim";
} @Override
public int getArgLength() {
return 3;
} /**
* This method is invoked when the builtin is called in a rule body.
*
* @param args the array of argument values for the builtin, this is an array
* of Nodes, some of which may be Node_RuleVariables.
* @param context an execution context giving access to other relevant data
* @return return true if the buildin predicate is deemed to have succeeded in
* the current environment
*/
@Override
public boolean bodyCall(Node[] args, int length, RuleContext context) {
checkArgs(length, context);
Node n1 = getArg(0, args, context);
Node n2 = getArg(1, args, context);
Node score = getArg(2,args,context); if(!score.isLiteral() || score.getLiteral().getValue()==null){
return false;
}
String value;
Double hold = Double.parseDouble(score.getLiteralValue().toString()); // n.isLiteral() && n.getLiteralValue() instanceof Number if (n1.isLiteral() && n2.isLiteral()) {
String v1 = n1.getLiteralValue().toString();
String v2 = n2.getLiteralValue().toString(); // 调用服务计算相似度
String requestUrl = "http://API-URL:5101/similarity/cosine?s1="+v1+"&s2="+v2;
String result = HttpClientUtil.doGet(requestUrl);
JSONObject json = JSON.parseObject(result);
if(json.getDouble("similarity") >= hold){
return true;
} return true;
}
return false;
}
}
  • 这里有个getArgLength和checkArgs(length, context),可以用来限制参数长度,检验必须符合该长度。
  • 可以通过getArg(idx, args, context)来获取待计算的参数
  • 上面的计算相似度,主要是调用外度的服务来计算两两的语义向量的cosine得分,如果满足阈值,我们就认为规则匹配

测试

我们来测试上面的定义的计算语义相似度的指令semsim,还是用第2篇里的例子:

我们新增加两个属性主要业务竞争对手,我们定义,如果两个公司的主要业务语义上相似,我们就认为两家公司是竞争对手。

        Property 主要业务 = myMod.createProperty(finance + "主要业务");
Property 竞争对手 = myMod.createProperty(finance + "竞争对手"); // 加入三元组 myMod.add(万达集团, 主要业务, "房地产,文娱");
myMod.add(融创中国, 主要业务, "房地产");

然后定义规则:

[ruleCompetitor: (?c1 :主要业务 ?b1) (?c2 :主要业务 ?b2) notEqual(?c1,?c2) semsim(?b1,?b2,0.6)  -> (?c1 :竞争对手 ?c2)]

规则意思是,公司C1 主要业务是 b1,c2 主要业务是b2,并且c1和c2不是同一家公司,如果b1,b2的相似度大于0.6,那么C1和c2是竞争对手。

完整测试代码:

       // 注册自定义builtin
BuiltinRegistry.theRegistry.register(new SemanticSimilarityBuiltin()); Model myMod = ModelFactory.createDefaultModel();
String finance = "http://www.example.org/kse/finance#";
Resource 孙宏斌 = myMod.createResource(finance + "孙宏斌");
Resource 融创中国 = myMod.createResource(finance + "融创中国");
Resource 乐视网 = myMod.createResource(finance + "乐视网");
Property 执掌 = myMod.createProperty(finance + "执掌");
Resource 贾跃亭 = myMod.createResource(finance + "贾跃亭");
Resource 地产公司 = myMod.createResource(finance + "地产公司");
Resource 公司 = myMod.createResource(finance + "公司");
Resource 法人实体 = myMod.createResource(finance + "法人实体");
Resource 人 = myMod.createResource(finance + "人");
Property 主要收入 = myMod.createProperty(finance + "主要收入");
Resource 地产事业 = myMod.createResource(finance + "地产事业");
Resource 王健林 = myMod.createResource(finance + "王健林");
Resource 万达集团 = myMod.createResource(finance + "万达集团");
Property 主要资产 = myMod.createProperty(finance + "主要资产"); Property 股东 = myMod.createProperty(finance + "股东");
Property 关联交易 = myMod.createProperty(finance + "关联交易");
Property 收购 = myMod.createProperty(finance + "收购"); Property 主要业务 = myMod.createProperty(finance + "主要业务");
Property 竞争对手 = myMod.createProperty(finance + "竞争对手"); // 加入三元组
myMod.add(孙宏斌, 执掌, 融创中国);
myMod.add(贾跃亭, 执掌, 乐视网);
myMod.add(王健林, 执掌, 万达集团);
myMod.add(乐视网, RDF.type, 公司);
myMod.add(万达集团, RDF.type, 公司);
myMod.add(融创中国, RDF.type, 地产公司);
myMod.add(地产公司, RDFS.subClassOf, 公司);
myMod.add(公司, RDFS.subClassOf, 法人实体);
myMod.add(孙宏斌, RDF.type, 人);
myMod.add(贾跃亭, RDF.type, 人);
myMod.add(王健林, RDF.type, 人);
myMod.add(万达集团, 主要资产, 地产事业);
myMod.add(万达集团, 主要业务, "房地产,文娱");
myMod.add(融创中国, 主要收入, 地产事业);
myMod.add(融创中国, 主要业务, "房地产");
myMod.add(孙宏斌, 股东, 乐视网);
myMod.add(孙宏斌, 收购, 万达集团); PrintUtil.registerPrefix("", finance); // 输出当前模型
StmtIterator i = myMod.listStatements(null, null, (RDFNode) null);
while (i.hasNext()) {
System.out.println(" - " + PrintUtil.print(i.nextStatement()));
} GenericRuleReasoner reasoner = (GenericRuleReasoner) GenericRuleReasonerFactory.theInstance().create(null);
reasoner.setRules(Rule.parseRules(
"[ruleHoldShare: (?p :执掌 ?c) -> (?p :股东 ?c)] \n"
+ "[ruleConnTrans: (?p :收购 ?c) -> (?p :股东 ?c)] \n"
+ "[ruleConnTrans: (?p :股东 ?c) (?p :股东 ?c2) -> (?c :关联交易 ?c2)] \n"
+ "[ruleCompetitor:: (?c1 :主要业务 ?b1) (?c2 :主要业务 ?b2) notEqual(?c1,?c2) semsim(?b1,?b2,0.6) -> (?c1 :竞争对手 ?c2)] \n"
+ "-> tableAll()."));
reasoner.setMode(GenericRuleReasoner.HYBRID); InfGraph infgraph = reasoner.bind(myMod.getGraph());
infgraph.setDerivationLogging(true); System.out.println("推理后...\n"); Iterator<Triple> tripleIterator = infgraph.find(null, null, null);
while (tripleIterator.hasNext()) {
System.out.println(" - " + PrintUtil.print(tripleIterator.next()));
}

运行结果:

 - (:万达集团 :关联交易 :乐视网)
- (:万达集团 :关联交易 :融创中国)
- (:万达集团 :竞争对手 :融创中国)
- (:万达集团 :关联交易 :万达集团)
- (:孙宏斌 :股东 :万达集团)
- (:孙宏斌 :股东 :融创中国)
- (:融创中国 :关联交易 :万达集团)
- (:融创中国 :竞争对手 :万达集团)
- (:融创中国 :关联交易 :乐视网)
- (:融创中国 :关联交易 :融创中国)
- (:乐视网 :关联交易 :万达集团)
- (:乐视网 :关联交易 :融创中国)
- (:乐视网 :关联交易 :乐视网)
- (:贾跃亭 :股东 :乐视网)
- (:王健林 :股东 :万达集团)
- (:公司 rdfs:subClassOf :法人实体)
- (:万达集团 :主要业务 '房地产,文娱')
- (:万达集团 :主要资产 :地产事业)
- (:万达集团 rdf:type :公司)
- (:地产公司 rdfs:subClassOf :公司)
- (:融创中国 :主要业务 '房地产')
- (:融创中国 :主要收入 :地产事业)
- (:融创中国 rdf:type :地产公司)
- (:孙宏斌 :收购 :万达集团)
- (:孙宏斌 :股东 :乐视网)
- (:孙宏斌 rdf:type :人)
- (:孙宏斌 :执掌 :融创中国)
- (:乐视网 rdf:type :公司)
- (:贾跃亭 rdf:type :人)
- (:贾跃亭 :执掌 :乐视网)
- (:王健林 rdf:type :人)
- (:王健林 :执掌 :万达集团)

可以根据需要,扩展更多的builtin,比如运行js,比如http请求。。。


作者:Jadepeng

出处:jqpeng的技术记事本--http://www.cnblogs.com/xiaoqi

您的支持是对博主最大的鼓励,感谢您的认真阅读。

本文版权归作者所有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。

知识图谱推理与实践(3) -- jena自定义builtin的更多相关文章

  1. 知识图谱推理与实践 (2) -- 基于jena实现规则推理

    本章,介绍 基于jena的规则引擎实现推理,并通过两个例子介绍如何coding实现. 规则引擎概述 jena包含了一个通用的规则推理机,可以在RDFS和OWL推理机使用,也可以单独使用. 推理机支持在 ...

  2. 知识图谱学习与实践(4)——通过例句介绍Sparql的使用

    通过例句介绍Sparql的使用 1 简介 SPARQL的定义,是一个递归的定义,为SPARQL Protocal and RDF Query Language,是W3C制定的RDF知识图谱标准查询语言 ...

  3. 知识图谱学习与实践(4)——Protégé使用入门

    1 Protégé简介 Protégé是一个本体建模工具软件,由斯坦福大学基于java语言开发的,属于开放源代码软件.软件主要用于语义网中本体的构建和基于本体的知识应用,是本体构建的核心开发工具,最新 ...

  4. 知识图谱学习与实践(6)——从结构化数据进行知识抽取(D2RQ介绍)

    1 概述 D2RQ,含义是把关系型数据库当作虚拟的RDF图数据库进行访问.D2RQ平台是一个将关系型数据库当作虚拟的.只读的RDF图数据库进行访问的系统.提供了基于RDF访问关系数据库的内容,而无需复 ...

  5. 知识图谱顶会论文(IJCAI-2022) TEMP:多跳推理的类型感知嵌入

    IJCAI-TEMP:知识图谱上多跳推理的类型感知嵌入 论文地址: Type-aware Embeddings for Multi-Hop Reasoning over Knowledge Graph ...

  6. 简单构建基于RDF和SPARQL的KBQA(知识图谱问答系统)

    本文主要通过python实例讲解基于RDF和SPARQL的KBQA系统的构建.该项目可在python2和python3上运行通过. 注:KBQA即是我们通常所说的基于知识图谱的问答系统.这里简单构建的 ...

  7. 知识图谱基础之RDF,RDFS与OWL

    https://blog.csdn.net/u011801161/article/details/78833958 https://blog.csdn.net/baidu_15113429/artic ...

  8. 知识图谱基础之RDF,RDFS与OWL 2

    https://zhuanlan.zhihu.com/p/32122644 看过之前两篇文章([1](为什么需要知识图谱?什么是知识图谱?——KG的前世今生), [2](语义网络,语义网,链接数据和知 ...

  9. K8s 学习者绝对不能错过的最全知识图谱(内含 58个知识点链接)

    作者 | 平名 阿里服务端开发技术专家 导读:Kubernetes 作为云原生时代的“操作系统”,熟悉和使用它是每名用户的必备技能.本篇文章概述了容器服务 Kubernetes 的知识图谱,部分内容参 ...

随机推荐

  1. Magicodes.IE之导入学生数据教程

    基础教程之导入学生数据 说明 本教程主要说明如果使用Magicodes.IE.Excel完成学生数据的Excel导入. 要点 本教程使用Magicodes.IE.Excel来完成Excel数据导入 需 ...

  2. 【集训Day1 测试】奇怪数

    奇怪数(odometer) [题目描述] 一个正整数Z是奇怪数,当且仅当满足的条件是:Z的所有数字中,只有一个数字不同于其他数字.例如:33323.110 都是奇怪数,而 9779.5555 都不是奇 ...

  3. Linux 系统调用 —— fork 内核源码剖析

    系统调用流程简述 fork() 函数是系统调用对应的 API,这个系统调用会触发一个int 0x80 的中断: 当用户态进程调用 fork() 时,先将 eax(寄存器) 的值置为 2(即 __NR_ ...

  4. SQL Server2017 安装完成后找不到启动项解决方案

    很多用于当SQL Server2017 安装完成后开始菜单找不到启动项无法启动SQL Server2017 其实你只需要安装一下SSMS-Setup-CHS就可以了 安装完成之后就有了 SSMS-Se ...

  5. Python使用百度地图API根据地名获取相应经纬度

    今天有个需求,要根据地名获取经纬度坐标值. 于是我第一想法:打开百度地图,手动输入地名,获取.显然当地名较少时,可实施.然而,当地名较多时,此方法显然工作量很大. 于是,第二想法:代码获取,请求百度地 ...

  6. docker配置mysql主从与django实现读写分离

    一.搭建主从mysql环境 1 下载mysql镜像 docker pull mysql:5.7 2 运行刚下载的mysql镜像文件 # 运行该命令之前可以使用`docker images`是否下载成功 ...

  7. 09-kubernetes StatefulSet

    目录 StatefulSet 简单测试 使用 StatefulSet 创建基础的PV StatefulSet 清单 StatefulSet 有状态应用副本集 无状态的, 更关注的是群体 有状态的, 更 ...

  8. ios注册通知NSNotificationCenter(一)

    作用:NSNotificationCenter是专门供程序中不同类间的消息通信而设置的. 注册通知:即要在什么地方接受消息 [[NSNotificationCenter defaultCenter]  ...

  9. php权重分配

    假设有3个人  能力的权重 分别为 A=>1,B=>2,C=>3,那么当有6个案子的时候  A分配到1个,B分配到2个,C分配到3个,这很合理,但是当案子只有5个,或者有7个的时候, ...

  10. python 计算两个日期间的小时数

    #!/usr/bin/env python #encoding: utf-8 import datetime def dateDiffInHours(t1, t2): td = t2 - t1 ret ...