SimdJsonSharp: Parsing gigabytes of JSON per second

C# version of lemire/simdjson (by Daniel Lemire and Geoff Langdale - https://arxiv.org/abs/1902.08318) fully ported from C to C#, I tried to keep the same format and API). The library accelerates JSON parsing and minification using SIMD instructions (AVX2). C# version uses System.Runtime.Intrinsics API.

UPD: Now it's also available as a set of pinvokes on top of the native lib as a .NETStandard 2.0 library, thus there are two implementations:

  1. Fully managed netcoreapp3.0 library (100% port from C to C#)
  2. netstandard2.0 library with native lib (autogenerated bindings for C)

Benchmarks

The following benchmark compares SimdJsonSharp with .NET Core 3.0 Utf8JsonReader, Json.NET and SpanJson libraries. Test json files can be found here.

1. Parse doubles

Open canada.json and parse all coordinates as System.Double:

  1. | Method | fileName | fileSize | Mean | Ratio |
  2. |---------------- |------------- |-------------|----------:|------:|
  3. | SimdJson | canada.json | 2,251.05 Kb | 4,733 ms | 1.00 |
  4. | Utf8JsonReader | canada.json | 2,251.05 Kb | 56,692 ms | 11.98 |
  5. | JsonNet | canada.json | 2,251.05 Kb | 70,078 ms | 14.81 |
  6. | SpanJsonUtf8 | canada.json | 2,251.05 Kb | 54,878 ms | 11.60 |

2. Count all tokens

  1. | Method | fileName | fileSize | Mean | Ratio |
  2. |------------------ |------------------- |------------ |-------------:|------:|
  3. | SimdJson | apache_builds.json | 127.28 Kb | 99.28 us | 1.00 |
  4. | Utf8JsonReader | apache_builds.json | 127.28 Kb | 226.42 us | 2.28 |
  5. | JsonNet | apache_builds.json | 127.28 Kb | 461.30 us | 4.64 |
  6. | SpanJsonUtf8 | apache_builds.json | 127.28 Kb | 168.08 us | 1.69 |
  7. | | | | | |
  8. | SimdJson | canada.json | 2,251.05 Kb | 4,494.44 us | 1.00 |
  9. | Utf8JsonReader | canada.json | 2,251.05 Kb | 6,308.01 us | 1.40 |
  10. | JsonNet | canada.json | 2,251.05 Kb | 67,718.12 us | 15.06 |
  11. | SpanJsonUtf8 | canada.json | 2,251.05 Kb | 6,679.82 us | 1.49 |
  12. | | | | | |
  13. | SimdJson | citm_catalog.json | 1,727.20 Kb | 1,572.78 us | 1.00 |
  14. | Utf8JsonReader | citm_catalog.json | 1,727.20 Kb | 3,786.10 us | 2.41 |
  15. | JsonNet | citm_catalog.json | 1,727.20 Kb | 5,903.38 us | 3.75 |
  16. | SpanJsonUtf8 | citm_catalog.json | 1,727.20 Kb | 3,021.13 us | 1.92 |
  17. | | | | | |
  18. | SimdJson | github_events.json | 65.13 Kb | 46.01 us | 1.00 |
  19. | Utf8JsonReader | github_events.json | 65.13 Kb | 113.80 us | 2.47 |
  20. | JsonNet | github_events.json | 65.13 Kb | 214.01 us | 4.65 |
  21. | SpanJsonUtf8 | github_events.json | 65.13 Kb | 89.09 us | 1.94 |
  22. | | | | | |
  23. | SimdJson | gsoc-2018.json | 3,327.83 Kb | 2,209.42 us | 1.00 |
  24. | Utf8JsonReader | gsoc-2018.json | 3,327.83 Kb | 4,010.10 us | 1.82 |
  25. | JsonNet | gsoc-2018.json | 3,327.83 Kb | 6,729.44 us | 3.05 |
  26. | SpanJsonUtf8 | gsoc-2018.json | 3,327.83 Kb | 2,759.59 us | 1.25 |
  27. | | | | | |
  28. | SimdJson | instruments.json | 220.35 Kb | 257.78 us | 1.00 |
  29. | Utf8JsonReader | instruments.json | 220.35 Kb | 594.22 us | 2.31 |
  30. | JsonNet | instruments.json | 220.35 Kb | 980.42 us | 3.80 |
  31. | SpanJsonUtf8 | instruments.json | 220.35 Kb | 409.47 us | 1.59 |
  32. | | | | | |
  33. | SimdJson | truenull.json | 12.00 Kb | 16,032.6 ns | 1.00 |
  34. | Utf8JsonReader | truenull.json | 12.00 Kb | 58,365.2 ns | 3.64 |
  35. | JsonNet | truenull.json | 12.00 Kb | 60,977.3 ns | 3.80 |
  36. | SpanJsonUtf8 | truenull.json | 12.00 Kb | 24,069.2 ns | 1.50 |

3. Json minification:

  1. | Method | fileName | fileSize | Mean | Ratio |
  2. |---------------------- |------------------- |------------ |-------------:|------:|
  3. | SimdJsonNoValidation | apache_builds.json | 127.28 Kb | 186.8 us | 1.00 |
  4. | SimdJson | apache_builds.json | 127.28 Kb | 262.5 us | 1.41 |
  5. | JsonNet | apache_builds.json | 127.28 Kb | 1,802.6 us | 9.65 |
  6. | | | | | |
  7. | SimdJsonNoValidation | canada.json | 2,251.05 Kb | 4,130.7 us | 1.00 |
  8. | SimdJson | canada.json | 2,251.05 Kb | 7,940.7 us | 1.92 |
  9. | JsonNet | canada.json | 2,251.05 Kb | 181,884.0 us | 44.06 |
  10. | | | | | |
  11. | SimdJsonNoValidation | citm_catalog.json | 1,727.20 Kb | 2,346.9 us | 1.00 |
  12. | SimdJson | citm_catalog.json | 1,727.20 Kb | 4,064.0 us | 1.75 |
  13. | JsonNet | citm_catalog.json | 1,727.20 Kb | 34,831.0 us | 14.84 |

Usage

The C# API is not stable yet and currently fully copies the original C-style API thus it involves some Unsafe magic including pointers.

Add nuget package SimdJsonSharp.Managed (for .NET Core 3.0) or SimdJsonSharp.Bindings for a .NETStandard 2.0 package (.NET 4.x, .NET Core 2.x, etc).

  1. dotnet add package SimdJsonSharp.Bindings
  2. or
  3. dotnet add package SimdJsonSharp.Managed

The following sample parses a file and iterate numeric tokens

  1. byte[] bytes = File.ReadAllBytes(somefile);
  2. fixed (byte* ptr = bytes) // pin bytes while we are working on them
  3. using (ParsedJson doc = SimdJson.ParseJson(ptr, bytes.Length))
  4. using (var iterator = doc.CreateIterator())
  5. {
  6. while (iterator.MoveForward())
  7. {
  8. if (iterator.GetTokenType() == JsonTokenType.Number)
  9. Console.WriteLine("integer: " + iterator.GetInteger());
  10. }
  11. }

UPD: for SimdJsonSharp.Bindings types are postfixed with 'N', e.g. ParsedJsonN

As you can see the API looks similiar to Utf8JsonReader that was introduced recently in .NET Core 3.0

Also it's possible to just validate JSON or minify it (remove whitespaces, etc):

  1. string someJson = ...;
  2. string minifiedJson = SimdJson.MinifyJson(someJson);

Requirements

  • AVX2 enabled CPU

SimdJsonSharp:每秒解析千兆字节的JSON的更多相关文章

  1. 千兆以太网TCP协议的FPGA实现

    转自https://blog.csdn.net/zhipao6108/article/details/82386355 千兆以太网TCP协议的FPGA实现 Lzx 2017/4/20 写在前面,这应该 ...

  2. 【转】简谈基于FPGA的千兆以太网

    原文地址: http://blog.chinaaet.com/luhui/p/5100052903 大家好,又到了学习时间了,学习使人快乐.今天我们来简单的聊一聊以太网,以太网在FPGA学习中属于比较 ...

  3. AC6102 开发板千兆以太网UDP传输实验2

    AC6102 开发板千兆以太网UDP传输实验 在芯航线AC6102开发板上,设计了一路GMII接口的千兆以太网电路,通过该以太网电路,用户可以将FPGA采集或运算得到的数据传递给其他设备如PC或服务器 ...

  4. AC6102 开发板千兆以太网UDP传输实验

    AC6102 开发板千兆以太网UDP传输实验 在芯航线AC6102开发板上,设计了一路GMII接口的千兆以太网电路,通过该以太网电路,用户可以将FPGA采集或运算得到的数据传递给其他设备如PC或服务器 ...

  5. 最新IP数据库 存储优化 查询性能优化 每秒解析上千万

    高性能IP数据库格式详解 每秒解析1000多万ip  qqzeng-ip-ultimate.dat 3.0版 编码:UTF8     字节序:Little-Endian 返回规范字段(如:亚洲|中国| ...

  6. 千兆网口POE供电

    一.IEEE802.3af与at标准的解析 链接:http://www.winchen.com.cn/ShowNews2.asp?ID=21&ClassID=1 2003 年6 月,IEEE  ...

  7. FPGA千兆网UDP协议实现

    接着上一篇百兆网接口的设计与使用,我们接着来进行FPGA百兆网UDP(User Datagram Protocol)协议的设计. 1)UDP简介 在此,参考博主夜雨翛然的博文“https://www. ...

  8. 【转】基于TMS320C6455的千兆以太网设计

    基于TI公司最新DSP芯片TMS320C6455.设计并实现了以太网通信软硬件接口.采用TMS320C6455片内以太网接口模块EMAC/MDIO,结合片外AR8031 PHY芯片,在嵌入式操作系统D ...

  9. 369-双路千兆网络PCIe收发卡

    双路千兆网络PCIe收发卡 一.产品概述 PCIe网络收发卡要求能支持千兆光口,千兆电口:半高板卡.板卡插于服务器,室温工作. 支持2路千兆光口,千兆电口. FPGA选用型号 XC7A50T-1FGG ...

随机推荐

  1. 【HDU6037】Expectation Division(动态规划,搜索)

    [HDU6037]Expectation Division(动态规划,搜索) 题面 Vjudge 你有一个数\(n\),\(n\le 10^{24}\),为了方便会告诉你\(n\)分解之后有\(m\) ...

  2. 朋友想玩下百度的ORC我鼓捣鼓捣thinkphp3集成百度sdk

    他想玩的是文字识别  那就玩下  咱们开始 1 先到百度文字识别  添加个应用  这样就有了APPID API KEY SECRET KEY https://console.bce.baidu.com ...

  3. go开发注意事项和dos的一些操作

    不需要加分号 写法 go编译器一行一行编译,所以多条语句不能写在同一行,否则会报错 go语言定义的变量或者import的包如果没有使用到,代码不能通过编译 func main() { ... } 只能 ...

  4. js使用工具将表单封装成json字符串传到后台,js截取字符串(学生笔记)

    <script src="js/jquery.min.js"></script> <script src="https://cdn.boot ...

  5. 易优CMS:arcview的基础用法

    [基础用法] 名称:arcview 功能:获取单条文档数据 语法: {eyou:arcview aid='文档ID'} <a href="{$field.arcurl}"&g ...

  6. JavaScript深入浅出第1课:箭头函数中的this究竟是什么鬼?

    <JavaScript 深入浅出>系列: JavaScript 深入浅出第 1 课:箭头函数中的 this 究竟是什么鬼? JavaScript 深入浅出第 2 课:函数是一等公民是什么意 ...

  7. 升鲜宝V2.0_杭州生鲜配送行业,条码标签管理之批量打印标签与分配配送任务相关操作说明_升鲜宝生鲜配送系统_15382353715_余东升

       升鲜宝V2.0_杭州生鲜配送行业,条码标签管理之批量打印标签与分配配送任务相关操作说明_升鲜宝供应链管理生鲜配送系统    题外话,随着国家对食材安全这个行业重视性越来越强,最近国家又出具了一些 ...

  8. LED 控制卡 单元板 接口引脚定义

    LED 12接口 使能 <--- OE A ---> 行选择信号 N B ---> 行选择信号 N C ---> 行选择信号 N CLK ---> 时钟信号 N LAT/ ...

  9. 第三方库Mantle的简单实用

    1. 测试时, 可以使用下面这个网址及代码来测试, 里面有模型,数组,以及字典, 还可以有long long 转NSDate,  string 转 int等. NSURL *url = [NSURLU ...

  10. sql使用cte表达式进行递归查询

    --递归获取所有子节点 with temp as ( select * from MK_Base_Department where F_DepartmentId='5f258320-c1b7-42a4 ...