Apache Jena TDB CRUD operations

June 11, 2015 by maltesander

http://tutorial-academy.com/apache-jena-tdb-crud-operations/

In this tutorial we explain Apache Jena TDB CRUD operations with simple examples. The CRUD operations are implemented with the Jena programming API instead of SPARQL. We provide deeper understanding of the internal operations of the TDB triple store and show some tips and tricks to avoid common programming errors.

1. What are Apache Jena TDB and CRUD operations?

Apache Jena is an open source Java framework for Semantic Web and Linked Data applications. It offers RDF and SPARQL support an Ontology API and Reasoning support as well as triple stores (TDB and Fuseki).

CRUD operations is an abbrevation for create, read, updat and delete and represents the most basic database operations. The same operations are available for triple stores and are shown in this tutorial for TDB.

2. Install Apache Jena and TDB

You can download and add the required libraries manually and add them to your Java Build Path. I recommend to download the full Apache Jena framework to use the Jena API later on. You can download it here.

If you use Maven add the following to your dependencies:

<dependency>
<groupId>org.apache.jena</groupId>
<artifactId>apache-jena-libs</artifactId>
<type>pom</type>
<version>2.13.0</version>
</dependency>

目前最新版本 3.2.0

We use the latest stable release which is 2.13.0 at the moment. Do not forget to update your Maven project afterwards.

3. Writing Java class for TDB access

We create a class called TDBConnection. In the constructor we already initialize the TDB triple store with a path pointing to a folder to be stored. We need a Dataset which is a collection of named graphs or an unamed default graph.

public class TDBConnection
{
private Dataset ds; public TDBConnection( String path )
{
ds = TDBFactory.createDataset( path );
}
}

If you have an ontology you want to store and manipulate you can use the following function to load it into the store. The begin and end functions mark transaction, which we strongly recommend to use throughout your application. It speeds up read operations and protects the data against data corruption, process termination or system crashes. You basically store multiple named models (namend graphs) in the dataset. You can store one default graph (no name).

public void loadModel( String modelName, String path )
{
Model model = null; ds.begin( ReadWrite.WRITE );
try
{
model = ds.getNamedModel( modelName );
FileManager.get().readModel( model, path );
ds.commit();
}
finally
{
ds.end();
}
}

If we do not want to load an ontology or model we can build it from scratch using an add method.

public void addStatement( String modelName, String subject, String property, String object )
{
Model model = null; ds.begin( ReadWrite.WRITE );
try
{
model = ds.getNamedModel( modelName ); Statement stmt = model.createStatement
(
model.createResource( subject ),
model.createProperty( property ),
model.createResource( object )
); model.add( stmt );
ds.commit();
}
finally
{
if( model != null ) model.close();
ds.end();
}
}

Moving on with reading stored triples. We store the results in a List of Statements.

public List<Statement> getStatements( String modelName, String subject, String property, String object )
{
List<Statement> results = new ArrayList<Statement>(); Model model = null; ds.begin( ReadWrite.READ );
try
{
model = ds.getNamedModel( modelName ); Selector selector = new SimpleSelector(
( subject != null ) ? model.createResource( subject ) : (Resource) null,
( property != null ) ? model.createProperty( property ) : (Property) null,
( object != null ) ? model.createResource( object ) : (RDFNode) null
); StmtIterator it = model.listStatements( selector );
{
while( it.hasNext() )
{
Statement stmt = it.next();
results.add( stmt );
}
} ds.commit();
}
finally
{
if( model != null ) model.close();
ds.end();
} return results;
}

For removing triples we use the following function.

public void removeStatement( String modelName, String subject, String property, String object )
{
Model model = null; ds.begin( ReadWrite.WRITE );
try
{
model = ds.getNamedModel( modelName ); Statement stmt = model.createStatement
(
model.createResource( subject ),
model.createProperty( property ),
model.createResource( object )
); model.remove( stmt );
ds.commit();
}
finally
{
if( model != null ) model.close();
ds.end();
}
}

The update method can be realized by removing and adding the new triple.

Finally we want to close the triple store if we finished our transactions

public void close()
{
ds.close();
}

Now we can move on to write a small test application.

4. Write a test application for the TDB Connection

If you are familiar with JUnit tests in Java, you can use the following code. We add some triples to two named graphs (named models), check the size of the result and remove some triples.

public class TDBConnectionTest extends TestCase
{
protected TDBConnection tdb = null; protected String URI = "http://tutorial-academy.com/2015/tdb#"; protected String namedModel1 = "Model_German_Cars";
protected String namedModel2 = "Model_US_Cars"; protected String john = URI + "John";
protected String mike = URI + "Mike";
protected String bill = URI + "Bill";
protected String owns = URI + "owns"; protected void setUp()
{
tdb = new TDBConnection("tdb");
} public void testAll()
{
// named Model 1
tdb.addStatement( namedModel1, john, owns, URI + "Porsche" );
tdb.addStatement( namedModel1, john, owns, URI + "BMW" );
tdb.addStatement( namedModel1, mike, owns, URI + "BMW" );
tdb.addStatement( namedModel1, bill, owns, URI + "Audi" );
tdb.addStatement( namedModel1, bill, owns, URI + "BMW" ); // named Model 2
tdb.addStatement( namedModel2, john, owns, URI + "Chrysler" );
tdb.addStatement( namedModel2, john, owns, URI + "Ford" );
tdb.addStatement( namedModel2, bill, owns, URI + "Chevrolet" ); // null = wildcard search. Matches everything with BMW as object!
List<Statement> result = tdb.getStatements( namedModel1, null, null, URI + "BMW");
System.out.println( namedModel1 + " size: " + result.size() + "\n\t" + result );
assertTrue( result.size() > 0); // null = wildcard search. Matches everything with john as subject!
result = tdb.getStatements( namedModel2, john, null, null);
System.out.println( namedModel2 + " size: " + result.size() + "\n\t" + result );
assertTrue( result.size() == 2 ); // remove all statements from namedModel1
tdb.removeStatement( namedModel1, john, owns, URI + "Porsche" );
tdb.removeStatement( namedModel1, john, owns, URI + "BMW" );
tdb.removeStatement( namedModel1, mike, owns, URI + "BMW" );
tdb.removeStatement( namedModel1, bill, owns, URI + "Audi" );
tdb.removeStatement( namedModel1, bill, owns, URI + "BMW" ); result = tdb.getStatements( namedModel1, john, null, null);
assertTrue( result.size() == 0); tdb.close();
}
}

If you do not want to use JUnit you can simply add the code to a main function.

public class TDBMain
{
public static void main(String[] args)
{
TDBConnection tdb = null; String URI = "http://tutorial-academy.com/2015/tdb#"; String namedModel1 = "Model_German_Cars";
String namedModel2 = "Model_US_Cars"; String john = URI + "John";
String mike = URI + "Mike";
String bill = URI + "Bill";
String owns = URI + "owns"; tdb = new TDBConnection("tdb");
// named Model 1
tdb.addStatement( namedModel1, john, owns, URI + "Porsche" );
tdb.addStatement( namedModel1, john, owns, URI + "BMW" );
tdb.addStatement( namedModel1, mike, owns, URI + "BMW" );
tdb.addStatement( namedModel1, bill, owns, URI + "Audi" );
tdb.addStatement( namedModel1, bill, owns, URI + "BMW" ); // named Model 2
tdb.addStatement( namedModel2, john, owns, URI + "Chrysler" );
tdb.addStatement( namedModel2, john, owns, URI + "Ford" );
tdb.addStatement( namedModel2, bill, owns, URI + "Chevrolet" ); // null = wildcard search. Matches everything with BMW as object!
List<Statement> result = tdb.getStatements( namedModel1, null, null, URI + "BMW");
System.out.println( namedModel1 + " size: " + result.size() + "\n\t" + result ); // null = wildcard search. Matches everything with john as subject!
result = tdb.getStatements( namedModel2, john, null, null);
System.out.println( namedModel2 + " size: " + result.size() + "\n\t" + result ); // remove all statements from namedModel1
tdb.removeStatement( namedModel1, john, owns, URI + "Porsche" );
tdb.removeStatement( namedModel1, john, owns, URI + "BMW" );
tdb.removeStatement( namedModel1, mike, owns, URI + "BMW" );
tdb.removeStatement( namedModel1, bill, owns, URI + "Audi" );
tdb.removeStatement( namedModel1, bill, owns, URI + "BMW" ); result = tdb.getStatements( namedModel1, john, null, null);
System.out.println( namedModel1 + " size: " + result.size() + "\n\t" + result );
tdb.close(); } }

5. Tips for developing with Jena and TDB

In your TDB storage folder you will find a file called nodes.dat, after initializing the TDB store. There you can check if your triples were inserted. Of course it gets complicated in a bigger graph, but it is kept mostly in plain text. Make use of the search function.

   <Model_5FGerman_5FCars>   +<http://tutorial-academy.com/2015/tdb#John>   +<http://tutorial-academy.com/2015/tdb#owns>   .<http://tutorial-academy.com/2015/tdb#Porsche>   *<http://tutorial-academy.com/2015/tdb#BMW>   +<http://tutorial-academy.com/2015/tdb#Mike>   +<http://tutorial-academy.com/2015/tdb#Bill>   +<http://tutorial-academy.com/2015/tdb#Audi>   <Model_5FUS_5FCars>   /<http://tutorial-academy.com/2015/tdb#Chrysler>   +<http://tutorial-academy.com/2015/tdb#Ford>   0<http://tutorial-academy.com/2015/tdb#Chevrolet>

If you delete triples and wonder why they are still kept in the nodes.dat, but do not show up when reading via the API, this is related to the TDB architecture.

6. TDB architecture

TDB uses a node table which maps RDF nodes to 64 bit integer Ids and the other way around. The 64 bit integer Ids are used to create indexes. The indexes allow database scans which are required to process SPARQL queries.

Now if new data is added, the TDB store adds entries to the node table and the indexes. Removing data only affects the indexes. Therefore the node table will grow continuously even if data is removed.

You might think that is a terrible way to store data, but there are good reasons to do so:

  1. The integer Ids contain file offsets. In order to accelerate inserts, the node table is a squential file. The Id to node lookup is a fast file scan. If data gets deleted from the node table, you have to recalculate and rewrite all file offsets.
  2. Now if data is deleted, we do not know how often a node is used without scanning the complete database. Consequently we do not know which node table entry should be deleted. A workaround would add complexity and slow down and delete operations.

Anyways, in our experience the majority of operations on a triple store are inserts and reads. If you ever have the trouble of running out of disk space, you may read the whole affected graph and store it from scratch while deleting the original one. Of course depending on the size, this may as well slow down the triple store.

【转载】Apache Jena TDB CRUD operations的更多相关文章

  1. Jena TDB 102

    1 Introduction TDB is a RDF storage of Jena. official guarantees and limitations TDB support full ra ...

  2. Jena TDB Assembler

    TDB Assembler Assemblers (装配器) 是Jena中用于描述将要构建的对象(通常是模型和数据集 models & datasets)的一种通用机制.例如, Fuseki ...

  3. Apache jena SPARQL endpoint及推理

    一.Apache Jena简介 Apache Jena(后文简称Jena),是一个开源的Java语义网框架(open source Semantic Web Framework for Java),用 ...

  4. 导入本体到Jena TDB数据库

    本体的存储方法或称本体持久化,大致分为基于内存的方式.基于文件的方式.基于数据库的方式和专门的管理工具方式4种(傅柱等, 2013).其中,基于数据库的方式又有基于关系数据库.基于面向对象数据库.基于 ...

  5. Outline of Apache Jena Notes

    1 description 这篇是语义网应用框架Apache Jena学习记录的索引. 初始动机见Apache Jena - A Bootstrap 2 Content 内容组织基本上遵循Jena首页 ...

  6. Jena TDB 101 Java API without Assembler

    Update on 2015/05/12 ongoing tutorials site on https://github.com/zhoujiagen/semanticWebTutorialUsin ...

  7. MyBatis Tutorial – CRUD Operations and Mapping Relationships – Part 1---- reference

    http://www.javacodegeeks.com/2012/11/mybatis-tutorial-crud-operations-and-mapping-relationships-part ...

  8. Apache Jena - A Bootstrap

    前言 这篇文档属探究立项性质,作为语义网和本体建模工作的延续. 依照NoSQL Distilled上的考察方法,将Apache Jena作为图数据库的泛型考察.   内容 多种出版物上声明主要有四类N ...

  9. Jena TDB assembler syntax

    1 introduction Assembler is a DSL of Jena to specify something to build, models and dataset, for exa ...

随机推荐

  1. ORACLE删除某用户下所有对象

    ORACLE删除某用户下所有对象 2013-10-26 15:50 4996人阅读 评论(1) 收藏 举报   --.sql脚本 --唯一注意的是下面的D:\dropobj.sql 为操作的.sql; ...

  2. Letterbox,Pillarbox和Pan&Scan

    Auto 不改变窗口设置16:9 PillarBox: 4:3的图像,在16:9的显示屏上显示时,上下到顶,左右会添加黑边. 16:9 Pan&Scan 4:3的图像,在16:9的显示屏上显示 ...

  3. css-inline-block和float的布局二者择其一?

    几个月前,带着不甘和忐忑毅然决然的在亚马逊离职了,当时不知道对我来说是好是坏,现在看来,当初的选择还是蛮不错的.感觉在亚马逊的几个月貌似接触最多的就是wiki和tt了,怀着对技术热忱离开,拒绝了腾讯, ...

  4. 转转转!java继承中的this和super

    学习java时看了不少尚学堂马士兵的视频,还是挺喜欢马士兵的讲课步骤的,二话不说,先做实例,看到的结果才是最实际的,理论神马的全是浮云.只有在实际操作过程中体会理论,在实际操作过程中升华理论才是最关键 ...

  5. java web 程序---缓冲代码

    在写验证码的时候,我的验证码是随机的,所以每次点击时,刷新页面,验证码都会改变. 可是,当我点击刷新时,验证码不变,说明,没有缓冲. 这里差三行代码. response.setHeader(" ...

  6. zufeoj Electrification Plan (最小生成树,巧妙设e[i][j]=0)

    Electrification Plan 时间限制: 1 Sec  内存限制: 128 MB提交: 31  解决: 13[提交][状态][讨论版] 题目描述 Some country has n ci ...

  7. 学习笔记之Moq

    dotnet/src/MoqSample at master · haotang923/dotnet · GitHub https://github.com/htanghtang/dotnet/tre ...

  8. tomcat 性能优化(内存优化 线程优化)

    转自:http://blog.sina.com.cn/s/blog_4b5bc01101014s81.html tomcat 性能优化 linux修改TOMCAT_HOME/bin/catalina. ...

  9. tp5模型笔记---多对多

    关联模型 一对一:HAS_ONE  以及对应的BELONEGS_TO 一对多:HAS_MANY 以及相对的BELONGS_TO 多对多:BELONGS_TO_MANY 步骤: 第一:创建Users模型 ...

  10. 团队作业(二):ASB

    团队作业(二):团队选题 题目四:基于Android的文件加密系统 系统名称:ASB 一.引言 1.1编写目的 (1)学习并熟悉掌握AES/DES加密算法的原理以及算法 (2)学习并熟悉Android ...