SimdJsonSharp:每秒解析千兆字节的JSON
SimdJsonSharp: Parsing gigabytes of JSON per second
C# version of lemire/simdjson (by Daniel Lemire and Geoff Langdale - https://arxiv.org/abs/1902.08318) fully ported from C to C#, I tried to keep the same format and API). The library accelerates JSON parsing and minification using SIMD instructions (AVX2). C# version uses System.Runtime.Intrinsics API.
UPD: Now it's also available as a set of pinvokes on top of the native lib as a .NETStandard 2.0 library, thus there are two implementations:
- Fully managed netcoreapp3.0 library (100% port from C to C#)
- netstandard2.0 library with native lib (autogenerated bindings for C)
Benchmarks
The following benchmark compares SimdJsonSharp with .NET Core 3.0 Utf8JsonReader, Json.NET and SpanJson libraries. Test json files can be found here.
1. Parse doubles
Open canada.json and parse all coordinates as System.Double:
| Method | fileName | fileSize | Mean | Ratio |
|---------------- |------------- |-------------|----------:|------:|
| SimdJson | canada.json | 2,251.05 Kb | 4,733 ms | 1.00 |
| Utf8JsonReader | canada.json | 2,251.05 Kb | 56,692 ms | 11.98 |
| JsonNet | canada.json | 2,251.05 Kb | 70,078 ms | 14.81 |
| SpanJsonUtf8 | canada.json | 2,251.05 Kb | 54,878 ms | 11.60 |
2. Count all tokens
| Method | fileName | fileSize | Mean | Ratio |
|------------------ |------------------- |------------ |-------------:|------:|
| SimdJson | apache_builds.json | 127.28 Kb | 99.28 us | 1.00 |
| Utf8JsonReader | apache_builds.json | 127.28 Kb | 226.42 us | 2.28 |
| JsonNet | apache_builds.json | 127.28 Kb | 461.30 us | 4.64 |
| SpanJsonUtf8 | apache_builds.json | 127.28 Kb | 168.08 us | 1.69 |
| | | | | |
| SimdJson | canada.json | 2,251.05 Kb | 4,494.44 us | 1.00 |
| Utf8JsonReader | canada.json | 2,251.05 Kb | 6,308.01 us | 1.40 |
| JsonNet | canada.json | 2,251.05 Kb | 67,718.12 us | 15.06 |
| SpanJsonUtf8 | canada.json | 2,251.05 Kb | 6,679.82 us | 1.49 |
| | | | | |
| SimdJson | citm_catalog.json | 1,727.20 Kb | 1,572.78 us | 1.00 |
| Utf8JsonReader | citm_catalog.json | 1,727.20 Kb | 3,786.10 us | 2.41 |
| JsonNet | citm_catalog.json | 1,727.20 Kb | 5,903.38 us | 3.75 |
| SpanJsonUtf8 | citm_catalog.json | 1,727.20 Kb | 3,021.13 us | 1.92 |
| | | | | |
| SimdJson | github_events.json | 65.13 Kb | 46.01 us | 1.00 |
| Utf8JsonReader | github_events.json | 65.13 Kb | 113.80 us | 2.47 |
| JsonNet | github_events.json | 65.13 Kb | 214.01 us | 4.65 |
| SpanJsonUtf8 | github_events.json | 65.13 Kb | 89.09 us | 1.94 |
| | | | | |
| SimdJson | gsoc-2018.json | 3,327.83 Kb | 2,209.42 us | 1.00 |
| Utf8JsonReader | gsoc-2018.json | 3,327.83 Kb | 4,010.10 us | 1.82 |
| JsonNet | gsoc-2018.json | 3,327.83 Kb | 6,729.44 us | 3.05 |
| SpanJsonUtf8 | gsoc-2018.json | 3,327.83 Kb | 2,759.59 us | 1.25 |
| | | | | |
| SimdJson | instruments.json | 220.35 Kb | 257.78 us | 1.00 |
| Utf8JsonReader | instruments.json | 220.35 Kb | 594.22 us | 2.31 |
| JsonNet | instruments.json | 220.35 Kb | 980.42 us | 3.80 |
| SpanJsonUtf8 | instruments.json | 220.35 Kb | 409.47 us | 1.59 |
| | | | | |
| SimdJson | truenull.json | 12.00 Kb | 16,032.6 ns | 1.00 |
| Utf8JsonReader | truenull.json | 12.00 Kb | 58,365.2 ns | 3.64 |
| JsonNet | truenull.json | 12.00 Kb | 60,977.3 ns | 3.80 |
| SpanJsonUtf8 | truenull.json | 12.00 Kb | 24,069.2 ns | 1.50 |
3. Json minification:
| Method | fileName | fileSize | Mean | Ratio |
|---------------------- |------------------- |------------ |-------------:|------:|
| SimdJsonNoValidation | apache_builds.json | 127.28 Kb | 186.8 us | 1.00 |
| SimdJson | apache_builds.json | 127.28 Kb | 262.5 us | 1.41 |
| JsonNet | apache_builds.json | 127.28 Kb | 1,802.6 us | 9.65 |
| | | | | |
| SimdJsonNoValidation | canada.json | 2,251.05 Kb | 4,130.7 us | 1.00 |
| SimdJson | canada.json | 2,251.05 Kb | 7,940.7 us | 1.92 |
| JsonNet | canada.json | 2,251.05 Kb | 181,884.0 us | 44.06 |
| | | | | |
| SimdJsonNoValidation | citm_catalog.json | 1,727.20 Kb | 2,346.9 us | 1.00 |
| SimdJson | citm_catalog.json | 1,727.20 Kb | 4,064.0 us | 1.75 |
| JsonNet | citm_catalog.json | 1,727.20 Kb | 34,831.0 us | 14.84 |
Usage
The C# API is not stable yet and currently fully copies the original C-style API thus it involves some Unsafe magic including pointers.
Add nuget package SimdJsonSharp.Managed (for .NET Core 3.0) or SimdJsonSharp.Bindings for a .NETStandard 2.0 package (.NET 4.x, .NET Core 2.x, etc).
dotnet add package SimdJsonSharp.Bindings
or
dotnet add package SimdJsonSharp.Managed
The following sample parses a file and iterate numeric tokens
byte[] bytes = File.ReadAllBytes(somefile);
fixed (byte* ptr = bytes) // pin bytes while we are working on them
using (ParsedJson doc = SimdJson.ParseJson(ptr, bytes.Length))
using (var iterator = doc.CreateIterator())
{
while (iterator.MoveForward())
{
if (iterator.GetTokenType() == JsonTokenType.Number)
Console.WriteLine("integer: " + iterator.GetInteger());
}
}
UPD: for SimdJsonSharp.Bindings types are postfixed with 'N', e.g. ParsedJsonN
As you can see the API looks similiar to Utf8JsonReader that was introduced recently in .NET Core 3.0
Also it's possible to just validate JSON or minify it (remove whitespaces, etc):
string someJson = ...;
string minifiedJson = SimdJson.MinifyJson(someJson);
Requirements
- AVX2 enabled CPU
SimdJsonSharp:每秒解析千兆字节的JSON的更多相关文章
- 千兆以太网TCP协议的FPGA实现
转自https://blog.csdn.net/zhipao6108/article/details/82386355 千兆以太网TCP协议的FPGA实现 Lzx 2017/4/20 写在前面,这应该 ...
- 【转】简谈基于FPGA的千兆以太网
原文地址: http://blog.chinaaet.com/luhui/p/5100052903 大家好,又到了学习时间了,学习使人快乐.今天我们来简单的聊一聊以太网,以太网在FPGA学习中属于比较 ...
- AC6102 开发板千兆以太网UDP传输实验2
AC6102 开发板千兆以太网UDP传输实验 在芯航线AC6102开发板上,设计了一路GMII接口的千兆以太网电路,通过该以太网电路,用户可以将FPGA采集或运算得到的数据传递给其他设备如PC或服务器 ...
- AC6102 开发板千兆以太网UDP传输实验
AC6102 开发板千兆以太网UDP传输实验 在芯航线AC6102开发板上,设计了一路GMII接口的千兆以太网电路,通过该以太网电路,用户可以将FPGA采集或运算得到的数据传递给其他设备如PC或服务器 ...
- 最新IP数据库 存储优化 查询性能优化 每秒解析上千万
高性能IP数据库格式详解 每秒解析1000多万ip qqzeng-ip-ultimate.dat 3.0版 编码:UTF8 字节序:Little-Endian 返回规范字段(如:亚洲|中国| ...
- 千兆网口POE供电
一.IEEE802.3af与at标准的解析 链接:http://www.winchen.com.cn/ShowNews2.asp?ID=21&ClassID=1 2003 年6 月,IEEE ...
- FPGA千兆网UDP协议实现
接着上一篇百兆网接口的设计与使用,我们接着来进行FPGA百兆网UDP(User Datagram Protocol)协议的设计. 1)UDP简介 在此,参考博主夜雨翛然的博文“https://www. ...
- 【转】基于TMS320C6455的千兆以太网设计
基于TI公司最新DSP芯片TMS320C6455.设计并实现了以太网通信软硬件接口.采用TMS320C6455片内以太网接口模块EMAC/MDIO,结合片外AR8031 PHY芯片,在嵌入式操作系统D ...
- 369-双路千兆网络PCIe收发卡
双路千兆网络PCIe收发卡 一.产品概述 PCIe网络收发卡要求能支持千兆光口,千兆电口:半高板卡.板卡插于服务器,室温工作. 支持2路千兆光口,千兆电口. FPGA选用型号 XC7A50T-1FGG ...
随机推荐
- Oracle基础教程(一)
本文链接:https://blog.csdn.net/GoldenKitten/article/details/84947386 以下内容为转载以上博客,自己做了略微的补充,如需查看原文,请点击上面的 ...
- DVR登录绕过漏洞_CVE-2018-9995漏洞复现
DVR登录绕过漏洞_CVE-2018-9995漏洞复现 一.漏洞描述 此漏洞允许攻击者通过修改”Cookie:uid=admin”之后访问特定DVR的控制面板,返回此设备的明文管理员凭证. 二.影响软 ...
- Redisson实现分布式锁(2)—RedissonLock
Redisson实现分布式锁(2)-RedissonLock 有关Redisson实现分布式锁上一篇博客讲了分布式的锁原理:Redisson实现分布式锁---原理 这篇主要讲RedissonLock和 ...
- 5.智能快递柜(通信篇-Server程序)
1.智能快递柜(开篇) 2.智能快递柜(终端篇) 3.智能快递柜(通信篇-HTTP) 4.智能快递柜(通信篇-SOCKET) 5.智能快递柜(通信篇-Server程序) 6.智能快递柜(平台篇) 7. ...
- 马蜂窝 iOS App 启动治理:回归用户体验
增长.活跃.留存是移动 App 的常见核心指标,直接反映一款 App 甚至一个互联网公司运行的健康程度和发展动能.启动流程的体验决定了用户的第一印象,在一定程度上影响了用户活跃度和留存率.因此,确保启 ...
- DQL---连接查询(内连接、外连接)、子查询、分页查询
一.连接查询 1.连接查询建立在有相互关系的两个表间,进行两个及两个以上的表或视图的查询. 2.对n张表进行查询,至少需要n-1个连接表的条件. 二.笛卡尔积(容易造成数据库宕机) 1.指表中每行元素 ...
- JavaScript之找LHS查询和RHS查询
LHS和RHS,当变量出现在赋值操作的左侧时进行LHS 查询,出现在右侧时进行RHS 查询. LHS 查询是试图找到变量的容器本身,从而可以对其赋值. RHS 理解成retrieve his sour ...
- 登陆远程服务器的Tomcat 8的manger的页面403
访问远程服务器Tomcat 8的管理页面报错 在远程服务器上安装了一个tomcat8.5,配置好用户后重新启动tomcat ,发现:8080可以访问,但是进入:8080/manager/html页面报 ...
- 计算机基础 python入门
1.计算机基础 计算机组成: 输入输出设备内. 存储器 .cpu .电源 .显卡 中央处理器(cpu) 处理各种数据 相当于人的大脑 内存 存储数据 相当于临时记忆 硬盘 存储数据 相当于人的永久记忆 ...
- python爬虫中文乱码问题(request方式爬取)
https://blog.csdn.net/guoxinian/article/details/83047746 req = requests.get(url)返回的是类对象 其包括的属性有: r ...