前面两节介绍了JSON和YAML,本文则对下面的文章做一个中英文对照翻译。

Comparison between JSON and YAML for data serialization
用于数据序列化的JSON和YAML之比较

Chapter 1 Introduction | 第一章 概述

This paper discusses and compares different serialization formats in computer science. In addition to a background of the serialization technique and how it is utilized as a method today, it also compares two important light-weight data interchange formats, namely JSON and YAML. Although they are quite similar when it comes to usability, there are some distinctions between them, mostly regarding design choices and syntax. These choices do however affect their scope of use. The report aims to discuss the various differences between the languages, along with the resulting consequences regarding performance and usability in different use cases which they lead to.
本文讨论和比较了计算机科学中不同的数据序列化格式。除了介绍数据序列化技术的背景和如今序列化作为一种方法是如何被使用的之外,本文还比较了两种轻量级的数据交换格式,JSON和YAML。虽然JSON和YAML在可用性上很相似,但是也有一些区别,其主要区别在设计考虑和语法上。然而,这些设计考虑对他们的使用范围无疑是有影响的。本文将主要讨论序列化语言之间的差异,以及这些差异所导致的不同的性能及可用性。

1.1 Problem statement | 问题描述

There has been an increase in discussions comparing the usability of YAML and JSON as serialization formats in recent years. Even though there are multiple thoughts and opinions on the net, there is a lack of actual general investigation on the subject.

近些年来,针对YAML和JSON作为数据序列化格式的可用性的比较讨论可谓层出不穷。网络上想法和意见也有很多,但对这一话题进行全面的调查研究却并不存在。

The primary aim for this project is to determine and compare the major differences between YAML and JSON from multiple perspectives. This will not only be done through a performance test, the comparison is also based on collected facts and research. From this, conclusions can be drawn regarding their usability and different scope of use. This comparison will help define the gap between them (if such exists), and will hopefully provide some guidelines to consider for future development involving data serialization.
XXX

Chapter 2 Background | 背景

This chapter explains and defines background facts regarding the different parts of this report. It includes information about the serialization and parsing process, but focuses on the serialization languages to be compared - YAML and JSON.
XXX

2.1 Serialization | 序列化

2.1.1 Definition | 定义

Serialization is a process for converting a data structure or object into a format that can be transmitted through a wire, or stored somewhere for later use [8].
XXXX

In terms of serialization there are a legion of different ways and formats that can be used. Which method and format to choose depends on the requirements set up on the object or data, and the use for the serialization (sending or storing). The choice may also affect the size of the serialized data as well as serialization/deserialization performance in terms of processing time and memory usage.
XXXX

2.1.2 General method | 一般方法

Common for all serialization methods is the procedure of reading data as a series, once started the whole object will usually be serialized/deserialized. This enables the use of simple I/O interfaces to hold and pass on the state of an object, although difficulties arise in applications which require higher performance by having a non-linear storage organization, or when the object contains large amounts of data. These cases requires more effort to deal with, and will not be covered in this paper. The most commonly used data structures when encoding data like this are scalars, maps and sequences (lists or arrays).
XXXX

Serialization is supported by many of the popular object-oriented programming languages like PHP, Ruby, Java, Smalltalk and Python along with the .NET Framework. All of these languages provide serialization methods either as implementable interface or as syntactic sugar. For example .NET provides a serializable attribute [10], and Java uses an interface named Serializable for classes to implement [9]. In Ruby, the term used for serialization is marshaling, and the language provides a module called Marshal for this [16]. This module can often be used without any changes to the definitions of the objects to be serialized. A serialization strategy can be defined in cases when you want to restrict the serialization process (all instance variables are serialized by default) or handle data in specific ways.
XXXX

Most of the standard serialization implementations converts the data into a binary string, which means that the data will not easily be inspected by a human in its serialized form.  Rubys Marshal module returns a plain text string, which however is not completely readable as it contains special byte sequences and is not formatted in a way to be easily read by a human.
XXXX

Example | 例如

A concrete example where serialization is needed is when storing information from an address book, in this case written in Java. Every instance contains a person with details about their address and phone number. One wants to store all instances on a server in exactly the way they are created and there are a few possible solutions;
XXXX

1. By using Java serialization, which is part of the language. This can easily be done, but problems arise if the data would have to be accessible to applications written in C++, Python or another language as the data is serialized in a way unique to Java.
XXXX

2. By using an improvised way of encoding the data into single strings, such as encoding four integers into for example 12:3:-23:67. This solution requires some custom parsing code to be written, and is most efficiently used when converting very simple data.
XXXXX

3. By serializing the data into XML. It is an attractive method due to the fact that XML is human readable and have bindings (API libraries) for many languages, although it is space intensive and can cause performance penalties on applications.
XXXX

Due to the ineffectiveness regarding these approaches mentioned above, other solutions are often desirable.
XXXX

2.1.3 Scope of use | 使用范围

Serialization is often used when transmitting data, as has been mentioned above. Some example of such cases are when storing user preferences in an object or for maintaining security information across pages and applications. In general, when transferring objects in applications, domains, or through firewalls, serialization can be very helpful.
XXXX

2.2 Parsing | 解析

2.2.1 General | 概述

The term parsing in computer science means in general to analyze written text, determining its grammatical structure from a known formal grammar. In linguistic terms, parse means analyzing and describe the grammar of a sentence. The parser splits up an expression into tokens which are then inserted into some kind of data structure. This data is the evaluated to interpret the meaning of each expression by the rules from given grammar, followed by execution of the appropriate action.
XXXX

2.2.2 Serialization and parsing | 序列化和解析

Serialization is mainly a method to maintain easy ways of storing, in the sense of converting data and then restore it into a semantically equivalent clone. Unless the serialization method used serializes the data in a coherent order (never changing) and expects the data to be read in the same order when deserializing, parsing will have to be done when the data is to be deserialized.  When deserializing, parsing is done to identify the data identifiers (attribute names or the like) and their corresponding values (while at the same time often having to discern the type of data).
XXXX

The following sections aims to introduce JSON and YAML and makes no statements about differences between them. This will be discussed later on in the results and conclusions chapters.
XXXX

2.3 JSON

2.3.1 General | 概述

JSON is a subset of the open ECMAScript standard [3](which the JavaScript programming language is an implementation of). It was created to be used as a way to parse human-readable (in plain text format) representations of data into valid ECMAScript objects [7]. It is completely language independent and uses notations similar to common programming languages such as C, C++, Java, etc.
XXXX

The format has grown to be very popular in cases where serialization and inter-change of structured data over networks [13] and is often associated with the modern web due to the fact that it is frequently used when communication between a web server and client side web application is requested.
XXXX

2.3.2 Origin | 起源

JSON was originally introduced as a written specification by Douglas Crockford in 2001 [4], who used the format within his company State Software. Crockford was not the first person to invent the object notation as other individuals had discovered it independently at about the same time, but he was the first one to give it a complete specification based on parts of the JavaScript standard. Following that he launched the JSON.org website in 2002, which still exists and currently provides a listing of JSON libraries for different programming languages [3]. It quickly grew in popularity partly thanks to its simplicity, which made it much more light weight (resulting in faster load times over the Internet) compared to XML, a format frequently used on the web. The other reason for the growth in usage is the increased use of JavaScript on the web.
XXXX

JSON documents can be parsed in JavaScript by calling the built-in eval function with the JSON string provided as an argument. The JavaScript interpreter will then execute the parameter as JavaScript code, constructing an object with the properties defined by the JSON string.  This will work due to the fact that JavaScript is a superset of JSON. Using the eval function is theoretically the most efficient way to parse JSON as it will just invoke the JavaScript interpreter (without any security/constraint cheks). This method can be said to be quite inelegant since the interpreter does not prevent any JavaScript code from being executed. In most cases a dedicated JSON parser should be used to avoid security issues and only allow valid JSON as input. Most of the modern browsers have had fast native JSON parsers since 2009, which are preferred to using eval [13].
XXXX

2.3.3 Functionality | 功能

JSON is human-readable language, foremost designed for its simplicity and universality. It implements basic data types available to most modern programming languages [6]. The fact that it also is easy to read and parse contributes to its usefulness in programming. JSON also is language-independent, meaning that the specification is not tied to any specific programming language (it was originally based on the JavaScript object notation however). The design incorporates data types common across most modern languages.
XXXX

The JSON standard does not support object references, which affects the ability to store cyclic structures for example. This functionality can be provided by an extension like dojox.json.ref from the Dojo Toolkit [19], enabling JSON objects to be marked with specific ids which can later be referenced to.
XXXX

Complex structures can also be built as associative arrays, objects within objects. JSON objects can contain any valid data type, enabling deep data hierarchies in JSON documents.
XXXX

The JSON format specification does not include support for validations of values or structure, but a external specification called JSON Schema exists as a draft [20]. JSON Schema can be used to define the structure of a JSON document much like an XML Schema, for example which data types values should have and if they are optional or required to be present. The defined schema can then be used to validate JSON documents or as a way to document application APIs.
XXXX

Valid data types are [2]:
o Numbers (floating-point numbers in scientific notation, infinity is not permitted)
o Strings (with Unicode support)
o Boolean (true/false)
o Objects (associative arrays / objects with key-value pairs)
o Arrays (ordered lists)
o Null

XXXX

2.3.4 Syntax | 语法

As is described above, JSON consists of objects, arrays and scalars. General syntax will be described in this section, which is intended to give an overview over the language and its usability regarding semantics. An example of a arbitrary JSON document can be found in table 2.1 (with code converted from the YAML example [6]).
XXXX

o Comments are not allowed in the current standard (they were removed by the author in a later revision of the specification [4]).
o Objects (unordered collection of name/value pairs) are denoted with braces({}).
o Identifiers must be enclosed in quotes (as a string) and are followed by a colon and value.
o Objects (associative arrays / objects with key-value pairs)
o Multiple key-value pairs are separated with a comma.
o Arrays (ordered set of values) are placed within brackets ([]) and separated by commas.
o The root node of a JSON document must be an object or an array.
XXXX

2.3.5 Scope of use | 使用范围

JSON, considered to be a more user-friendly alternative to XML, is often used as a substitute to it. When XML has been said to contribute with a lot of unnecessary baggage, JSON documents can contain the same information while also being much more light weight and easy to read [5]. JSON is most commonly used when exchanging or storing structured data. It is especially common in Ajax web applications, where it provides a standardized data exchange format for JavaScript implementations [11].
XXXX

2.3.6 Process | 处理过程

JSON is parsed (deserialized) in a simple character by character reading, constructing structures and object in one single pass. JavaScript implementations allows a parameter for an external function (called a reviver) to be provided, allowing more specific transformation of data. Serialization is also done in one single iteration through the data structure, where most implementations call a to_json (or similarly named) method, either earlier defined by the implementation or by the user, and then appends the result of this method call to the JSON output.
XXX

Table 2.1: Log file for an arbitrary application in JSON format

 [
{
"User": "ed",
"Time": "2001-11-23 15:01:42 -5",
"Warning": "This is an error message for the log file"
},
{
"User": "ed",
"Time": "2001-11-23 15:02:31 -5",
"Warning": "A slightly different error message."
},
{
"User": "ed",
"Date": "2001-11-23 15:03:17 -5",
"Fatal": "Unknown variable \"bar\"",
"Stack": [
{
"code": "x = MoreObject(\"345\\n\")\n",
"line": 23,
"file": "TopClass.py"
},
{
"code": "foo = bar",
"line": 58,
"file": "MoreClass.py"
}
]
}
]

A JSON document with an array containing multiple log entries.

...未完待续,请耐心等待...

数据序列化导读(3)[JSON v.s. YAML]的更多相关文章

  1. 数据序列化导读(1)[JSON]

    所谓数据序列化(Data Serialization), 就是将某个对象的状态信息转换为可以存储或传输的形式的过程. 那么,为什么要进行序列化? 首先,为了方便数据存储: 其次,为了方便数据传递. 在 ...

  2. 数据序列化导读(2)[YAML]

    上一节讲了JSON, 这一节将介绍YAML.可以认为,YAML是JSON的超集,但是更加简单易用,适合人类阅读和书写. 1.  什么是YAML? YAML是YAML Ain't Markup Lang ...

  3. Python基础4 迭代器,生成器,装饰器,Json和pickle 数据序列化

    本节内容 迭代器&生成器 装饰器 Json & pickle 数据序列化 软件目录结构规范 作业:ATM项目开发 1.列表生成式,迭代器&生成器 列表生成式 孩子,我现在有个需 ...

  4. 迭代器/生成器/装饰器 /Json & pickle 数据序列化

    本节内容 迭代器&生成器 装饰器 Json & pickle 数据序列化 软件目录结构规范 作业:ATM项目开发 1.列表生成式,迭代器&生成器 列表生成式 孩子,我现在有个需 ...

  5. Jackson序列化和反序列化Json数据完整示例

    Jackson序列化和反序列化Json数据 Web技术发展的今天,Json和XML已经成为了web数据的事实标准,然而这种格式化的数据手工解析又非常麻烦,软件工程界永远不缺少工具,每当有需求的时候就会 ...

  6. JSON和php里的数据序列化

    JSON就是一种数据结构,独立于语言 {"1":"one","2":"two","3":" ...

  7. Python-Day4 Python基础进阶之生成器/迭代器/装饰器/Json & pickle 数据序列化

    一.生成器 通过列表生成式,我们可以直接创建一个列表.但是,受到内存限制,列表容量肯定是有限的.而且,创建一个包含100万个元素的列表,不仅占用很大的存储空间,如果我们仅仅需要访问前面几个元素,那后面 ...

  8. Redis 数据序列化方法 serialize, msgpack, json, hprose 比较

    最近弄 Redis ,涉及数据序列化存储的问题,对比了:JSON, Serialize, Msgpack, Hprose 四种方式 1. 对序列化后的字符串长度对比: 测试代码: $arr = [0, ...

  9. Python之数据序列化(json、pickle、shelve)

    本节内容 前言 json模块 pickle模块 shelve模块 总结 一.前言 1. 现实需求 每种编程语言都有各自的数据类型,其中面向对象的编程语言还允许开发者自定义数据类型(如:自定义类),Py ...

随机推荐

  1. Linux常用备份恢复工具

    在 Linux 中可以通过各种各样的方法来执行备份.所涉及的技术从非常简单的脚本驱动的方法,到精心设计的商业化软件.备份可以保存到远程网络设备.磁带驱动器和其他可移动媒体上.备份可以是基于文件的或基于 ...

  2. 将电脑信息上传到中国移动ONENET平台

    用两个小时做的 可以用在服务器远程运维等环境非常方便 需要源码的可以联系NBDX123

  3. iOS中的序列帧动画

    UIImageView对象的有一个animationImages属性,将图片数组赋值给该属性即可.如图: 控制动画的播放方法是:[ ___  startAnimating]; 控制动画的停止方法是:[ ...

  4. Uval4726-数形结合的思想

    题意:给定一段01序列,求一段长度不小于L的连续序列,使其平均值最大 思路:一看就想到了斜率优化,但是用基本的推公示一直没推出来,看了别人的代码,像推出斜率的式子一直没弄出来..后来一看别人写的题解, ...

  5. TTL与CMOS门电路

    个人观点总结 对TTL和CMOS门电路的认识: 1.构成 TTL集成电路一般都是有三极管(或二极管)和电阻.电容构成,其中三极管(二极管)是作为主要的开关器件 CMOS集成电路一般是由场效应管和电阻. ...

  6. Android-WebView加载网页(new WebView(this)方式)

    之前的博客,都是 findViewById(R.id.webview);,来得到WebView, 此博客使用 new WebView(this)方式; AndroidManifest.xml中配置网络 ...

  7. 【转载】Configure the max limit for concurrent TCP connections

    转载地址:http://smallvoid.com/article/winnt-tcpip-max-limit.html To keep the TCP/IP stack from taking al ...

  8. bootstrap基础学习(四)——网格系统(列的偏移、排序、嵌套)

    网格系统——列偏移.列排序.列嵌套 列偏移:有的时候,我们不希望相邻的两个列紧靠在一起,但又不想使用margin或者其他的技术手段来.这个时候就可以使用列偏移(offset)功能来实现.使用列偏移也非 ...

  9. NET npoi 保存文件

    npoi完整代码:NET npoi帮助类 public static void DataTableToExcel(List<DataTable> dataTables, string fi ...

  10. 6.翻译:EF基础系列---什么是EF中的实体?

    原文地址:http://www.entityframeworktutorial.net/basics/what-is-entity-in-entityframework.aspx EF中的实体就是继承 ...