Notes of Linked Data concept and application - TODO
Data plays a core role in most business systems, data storage and retrieval tasks seem plain to regular application developers, even managers, while how to connect or link data to gain more interesting patterns(more technically, data mining) is still roughly hard. Many data processing and analysis frameworks try to solve giant data volume problems, this note is dedicated to record features of data structuring problem in data mining, and pay more attentions on representation, storage, navigation of relationships in data models.
myself, who also are interested in semantic web techniques.
support a descriptive running example
linked data concepts
linked data application based on semantic web techniques
find a clue to use Neo4j in semantic web techniques or linked data
Related Notes
Apache Jena Fuseki notes
2015/06/26 initial plan
2015/06/27 1-5: introduction, FOAF, SPARQL, etc: need review
2015/06/28 6-7: RDFa and RDF storage
2015/06/29 Related Notes - Fuseki
1 introducing Linked Data
5-star scoring system of Linked Data: P.4
The DBpedia project( extracts this structured data from Wikipedia
articles and puts it on the Web.
Linked Data has one amazing property: it may be easily combined with other
Linked Data to form new knowledge.
Another useful feature of Linked Data is that it’s self-documenting.
Linked Data is no silver bullet. It won’t protect you from issues of data quality or
from service failures.
Linked Data principles
- Use URI s as names for things.
- Use HTTP URI s so that people can look up those names.
- When someone looks up a URI , provide useful information, using the standards (RDF,SPARQL ).
- Include links to other URI s, so people can discover more things.
see more in Tim Berners-Lee's thoughts on Linked Data principles
The Linking Open Data(LOD) project
The LOD project 4 is a community activity started in 2007 by the W3C ’s Semantic Web Education and Outreach (SWEO) Interest Group. The project’s stated goal is to “make data freely available to everyone.”
commonly used RDF prefixes: P.40
RDF formats
- turtle: human-readable format
- RDF/XML: orginal RDF format in XML
- RDFa: RDF embedded in HTML attributes
- JSON-LD: a newer formaint aimed at web developers
RDF in the web
Media types:
RDF Format | Preferred Content-Type | Alternative Content-Type |
RDF Turtle file | text/turtle | |
RDF/XML file | application/rdf+xml | |
RDFa | text/html | |
JSON-LD file | application/ld+json | application/json |
OWL file | application/owl+xml | application/rdf+xml |
N-Triples | application/N-Triples | text/plain |
file types and web server
publishing RDF content using Apache HTTP servers:
Linked Data platforms or Semantic Web products, for example Callimachus, see semanticweb tool and sw wiki tool for more products,
3 comsuming Linked Data
3.1 thinking the Web way
In using structured data, you’re enabling machine readability and indexing of this data.
In interlinking published data on the Web, you’re enabling reuse of your information.
3.2 find Linked Data on the web
a Question: is President Barack Obama a Star Wars fan?
3.3 retrieving Linked Data from web pages
tools for finding distributed Linked Data
- Sindice: the semantic web index
- identify equivalent URI s to the Linked Data URI entered and provide an entry point to perform a Sindice search on a general search term
- Data Hub: a community-run catalog of useful sets of Linked Data on the Web
3.4 combine Linked Data from multiple sources
from known datasets
Product DB aims to be the World’s most comprehensive and open source of product data.
Its data including ProductWiki, MusicBrainz, DB pedia, Freebase, and OpenLibrary, and is gathered by search engines’ crawl sites that publish GoodRelations RDFa or Open Graph protocol data in their pages; for example, or example, BestBuy, IMDb, and Spotify.
from web pages using browser plug-ins
Mozilla add-ons: RDFa Developer
You can use the outcome of to help identify a canonical URL for a given item. A canonical URL is the best URL among available choices.
3.5 display Linked Data in HTML
Using Python to crawl the Linked Data Web
example: use the Python scripting language, RDFLib, and html5lib to access the RDF a data available from Best Buy for a sample product, the Darth Vader Alarm Clock Radio.
install python modules:
$ sudo pip install rdflib
$ sudo pip install html5lib
core code snippet:
import rdflib
import html5lib
graph = rdflib.Graph()
result = graph.parse('', format='turtle')
bestBuyGraph = rdflib.Graph()
bestBuyResult = bestBuyGraph.parse('', format='rdfa')
5.1 SPARQL syntax
Each SPARQL SELECT query is organized as follows:
- PREFIX (Namespace prefixes.)
- SELECT (Define what you wish to retrieve.)
- FROM (Specify the dataset from which to draw the results.)
(Describe the criteria on which to base the selection. This description is in the form of a query triple pattern.)
} - ORDER BY , LIMIT , and the like (Modifiers that affect the desired result.)
types of SPARQL queries
5.2 SPARQL endpoint
depedia: an online playground
sample queries:
PREFIX rdfs: <>
PREFIX dbprop: <>
select ?location
where {
?person rdfs:label "George Washington"@en.
?location dbprop:namedFor ?person
Query using Apache Jena ARQ:
JENA_HOME> ./bin/arq --query ./bin/query/location.rq
JENA_HOME> ./bin/arq --query ./bin/query/location.rq --results JSON
Seriously, you should reference [1] as a second try with SPARQL.
Online map generator Google Static Maps
6 Enhance results for search engines
purpose: provide semantic meaning to web content and enable the extraction of Linked Data. This enables your website to be both machine- and human-readable.
6.1 enhacing HTML with embedding RDFa
RDF in Attributes(RDFa) is a language that allows you to express RDF data within an HTML document.
RDFa 1.1
HTML+RDFa 1.1, Support for RDFa in HTML4 and HTML5
Tool: RDFa 1.1 Distiller and Parser
HTML5 and RDFa support
<html version="HTML+RDFa 1.1" lang="en">
<body id="me"
RDFa attributes in HTML5 tags:
- vocab
- resource
- about
- datatype
- typeof
- prefix
- rel
- inlist
- property
- content
- rev
6.2 embedding RDFa with a supporting official RDF vocabulary
[1] GoodRelations
GoodRelations is the most widely used RDF vocabulary for e-commerce. It enables you
to publish details of your products and services in a way that search engines, mobile
applications, and browser extensions can utilize the information and improve your click-through rates.
GoodRelations website
GoodRelations' concept model wiki
[2] is a collaborative initiative by three major search engines: Yahoo!, Bing, and Google.
Its purpose is to create and support a common set of schema for structured data markup on web pages
and to provide a common means for webmasters to mark up their pages so that the search results
are improved and human users have a more satisfying experience.
6.3 extract RDFa from HTML and applying SPARQL query
RDF extracted from the RDF a-enhanced HTML files can be queried using SPARQL.
TODO: programming procedures not using RDFa 1.1 Distiller and Parser.
7 RDF datasets
RDF dataset in W3C technology stack
7.1 classification of RDF DB system
RDF abstract view:
RDF as a generic, graph-based data model that represents data in the form of triples. These triples are records containing three values (subject, predicate, object) containing ( URI , URI , URI ) or ( URI , URI , value)
relational DB implemented RDF store:
Category | Description |
*Vertical(triple)table stores* | Each RDF triple is stored directly in a three-column table (subject, predicate, object). |
*Property(n-ary)table stores* | Multiple RDF properties are modeled as n-ary table columns for a single subject. |
*Horizontal(binary)table stores* | RDF triples are modeled as one horizontal table or into a set of vertically partitioned binary tables where each individual table represents an RDF property. |
Commonly used triplestores:
- 4Store
- Allegro-Graph
- BigData
- Fuseki
- Mulgara
- Oracle
- Redland RDF Library
- Sesame
- StarDog
- Virtuoso
7.2 battle between RDF storage and RDBMS
omitted :-)
7.3 convert anything to RDF
W3C's index of Converter to RDF tools
integrate data from CSV/XML returned web service and local file
- python
- Fuseki
[1] Hebeler J, Fisher M, et al. Web 3.0与Semantic Web编程[M]. 清华大学出版社, 北京.2010.
[2] Wood D., Zaidman M., Ruth L., et al. Linked Data: Structured data on the web[M].Manning Publications Co.: 2014.
Notes of Linked Data concept and application - TODO的更多相关文章
- 【DataStructure】Linked Data Structures
Arrayss work well for unordered sequences, and even for ordered squences if they don't change much. ...
- [XAF] Simplifying integration of custom controls bound to data from XAF application database
ASP.NET: WinForms:
- python dpkt SSL 流tcp payload(从三次握手开始到application data)和证书提取
# coding: utf-8 #!/usr/bin/env python from __future__ import absolute_import from __future__ import ...
- Linux C double linked for any data type
/************************************************************************** * Linux C double linked ...
- Dynamic Data linq to SQL Web Application
微软提供了一个数据驱动网站模板,可以自动生成CRUD页面,使用过程中碰到些问题 1.首先是如何应用,只需要创建个context并且在Global.asax里面加入下面这一句就可以了 DefaultMo ...
- Cross-Domain Security For Data Vault
Cross-domain security for data vault is described. At least one database is accessible from a plural ...
- ExtJS4笔记 Data
The data package is what loads and saves all of the data in your application and consists of 41 clas ...
- Introduction to Structured Data Structured data refers to kin ...
- Data Types
原地址: Home / Database / Oracle Database Online Documentation 11g Release 2 (11.2) / Database Administ ...
- document.cookie的使用
设置cookie每个cookie都是一个名/值对,可以把下面这样一个字符串赋值给document.cookie:document.cookie="userId=828";如果要一次 ...
- 1.精通前端系列技术之js正则表达式
在不会正则的时候,我们寻找字符串某些规律或者截取部分特殊字符的时候,我们需要写很多行代码来获取我们想要的字符串,在使用正则之后,代码量会大量简洁很多 1.字符串的比较,判断是否数字类型的字符串,我们用 ...
- 端午小长假--前端基础学起来03CSS为网页添加样式
定义:用于定义HTML内容在浏览器内的显示样式,如文字大小,颜色,字体 设置样式:将要设置样式的内容用<span></span>样式括起来,然后再head中设置span < ...
- 对项目的测试--Resharper
初学 这里做个记录. 1:安装后,Resharper会用他自己的英文智能提示,替换掉 vs2010的智能提示,所以我们要换回到vs2010的智能提示 2:快捷键.是使用vs2010的快捷键还是使用 R ...
- iOS中实现多线程的技术方案
pthread 实现多线程操作 代码实现: void * run(void *param) { for (NSInteger i = 0; i < 1000; i++) { ...
- 【kate总结】matlab调用opencv总结
正常情况下,编写好matlab调用opencv的代码. 1.输入 MEX XX.CPP(所有的mex都要编译) 2.将生成的.mexw64 放到要调用的文件夹下即可 出错总结: 本人写的matla ...
- iOS开发之通知使用总结
通知中心(NSNotificationCenter) 每一个应用程序都有一个通知中心(NSNotificationCenter)实例,专门负责协助不同对象之间的消息通信 任何一个对象都可以向通知中心发 ...
- 《day15---多线程安全问题_JDK1.5的锁机制》
//15同步问题的分析案例以及解决思路 //两个客户到一个银行去存钱,每个客户一次存100,存3次. //问题,该程序是否有安全问题,如果有,写出分析过程,并定于解决方案. /* 发现运行结果: su ...
- 转 15款免费WiFi(入侵破解)安全测试工具
转: 一.Vistumbler扫描器 WiFi 扫描器能能发现附近AP的详细信息,例如信号强 ...
- 3D中的切线空间简介
转自: 1. 什么是Tangent space? Tangent space和wo ...