以解析csv数据为例,讨论string、char[]、stream 不同类型来源是否能进行高性能读取解析封装可能性
篇幅较长,所以首先列举结果,也就是我们的目的
核心目的为探索特定场景对不同类型数据进行统一抽象,并达到足够高性能,也就是一份代码实现,对不同类型数据依然高性能
以下为结果,也就是我们的目的:
对1w行 csv 数据的string进行 RFC4180 csv标准进行解析,
string 类型 csv 应该比 StringReader 性能更高
甚至对比大家使用非常多的 csvhelper 不应该性能差太多
测试代码如下
[MemoryDiagnoser]
public class CsvTest
{
private const string testdata = """
a,b
1,2
3sss,3333
1,2
3sss,3333
1,2
/// 1w 行
""";
private CsvConfiguration config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Mode = CsvMode.RFC4180,
};
[Benchmark]
public void CsvHelper_Read()
{
using var sr = new StringReader(testdata);
using var csv = new CsvHelper.CsvReader(sr, config);
var records = new List<string[]>();
csv.Read();
csv.ReadHeader();
while (csv.Read())
{
var record = new string[csv.ColumnCount];
for (var i = 0; i < record.Length; i++)
{
record[i] = csv.GetField(i);
}
records.Add(record);
}
//var d = records.ToArray();
}
[Benchmark]
public void RuQu_Read_Csv_StringReader()
{
using var sr = new StringReader(testdata);
using var reader = new RuQu.Csv.CsvReader(sr, fristIsHeader: true);
var d = reader.ToArray();
}
[Benchmark]
public void RuQu_Read_Csv_String()
{
using var reader = new RuQu.Csv.CsvReader(testdata, fristIsHeader: true);
var d = reader.ToArray();
}
}
性能测试结果:
BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
13th Gen Intel Core i9-13900KF, 1 CPU, 32 logical and 24 physical cores
.NET SDK 8.0.200
[Host] : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2
Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|
CsvHelper_Read | 816.5 μs | 7.67 μs | 7.17 μs | 82.0313 | 81.0547 | 41.0156 | 1.2 MB |
RuQu_Read_Csv_StringReader | 406.1 μs | 1.83 μs | 1.53 μs | 62.5000 | 52.2461 | - | 1.13 MB |
RuQu_Read_Csv_String | 363.3 μs | 4.27 μs | 3.99 μs | 62.5000 | 52.2461 | - | 1.13 MB |
那么这样的表现,如何达到呢?我们就从最初我的思考开始
数据类型多样性
众所周知,我们可以将csv 这样的文本数据用各种各样的数据类型或者存储形式承载
比如:
csv
|--- string "a,b\r\n1,2\r\n3,4"
|--- char[]
|--- byte[]
|--- MemoryStream
|--- NetworkStream
|--- ....
那么我们是否能对这些类型进行封装抽象,然后以一份代码实现 csv 解析,并达到高性能呢?
数据类型归类
根据数据类型特点,我们可以归类为两种
无需编码转换的固定长度数组
- string
- char[]
需要编码转换的不明确长度的来源
- byte[]
- MemoryStream
- NetworkStream
那么我们以后者更高的复杂度抽象肯定能兼容前者
高性能基石
其次以 csv 解析实现考虑,字符对比,查找必然是首要考虑
现在这方面首选必然是 ReadOnlySpan<T>
其主要对于我们解析有两大优势
减少数据复制
ReadOnlySpan实例通常用于引用数组的元素或数组的一部分。 但是,与数组不同, ReadOnlySpan 实例可以指向堆栈上托管的内存、本机内存或托管的内存。
其实现的部分代码如下
public readonly ref struct ReadOnlySpan<T>
{
/// <summary>A byref or a native ptr.</summary>
internal readonly ref T _reference;
/// <summary>The number of elements this ReadOnlySpan contains.</summary>
private readonly int _length; /// <summary>
/// Creates a new read-only span over the entirety of the target array.
/// </summary>
/// <param name="array">The target array.</param>
/// <remarks>Returns default when <paramref name="array"/> is null.</remarks>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public ReadOnlySpan(T[]? array)
{
if (array == null)
{
this = default;
return; // returns default
} _reference = ref MemoryMarshal.GetArrayDataReference(array);
_length = array.Length;
} public override string ToString()
{
if (typeof(T) == typeof(char))
{
return new string(new ReadOnlySpan<char>(ref Unsafe.As<T, char>(ref _reference), _length));
}
return $"System.ReadOnlySpan<{typeof(T).Name}>[{_length}]";
} [MethodImpl(MethodImplOptions.AggressiveInlining)]
public ReadOnlySpan<T> Slice(int start, int length)
{
#if TARGET_64BIT
// See comment in Span<T>.Slice for how this works.
if ((ulong)(uint)start + (ulong)(uint)length > (ulong)(uint)_length)
ThrowHelper.ThrowArgumentOutOfRangeException();
#else
if ((uint)start > (uint)_length || (uint)length > (uint)(_length - start))
ThrowHelper.ThrowArgumentOutOfRangeException();
#endif return new ReadOnlySpan<T>(ref Unsafe.Add(ref _reference, (nint)(uint)start /* force zero-extension */), length);
}
从上述三个方法可以看出,其通过指针等操作,以 struct 极小代价能让我们共享访问数组数据或者片段
span 有 SIMD 优化
span 有着很多 SIMD优化
SIMD,即Single Instruction, Multiple Data,一条指令操作多个数据.是CPU基本指令集的扩展.主要用于提供fine grain parallelism,即小碎数据的并行操作.比如说图像处理,图像的数据常用的数据类型是RGB565, RGBA8888, YUV422等格式,这些格式的数据特点是一个像素点的一个分量总是用小于等于8bit的数据表示的.如果使用传统的处理器做计算,虽然处理器的寄存器是32位或是64位的,处理这些数据确只能用于他们的低8位,似乎有点浪费.如果把64位寄存器拆成8个8位寄存器就能同时完成8个操作,计算效率提升了8倍.
以下是 span 部分代码示例
internal static partial class SpanHelpers // .Char
{
public static int IndexOf(ref char searchSpace, int searchSpaceLength, ref char value, int valueLength)
{
Debug.Assert(searchSpaceLength >= 0);
Debug.Assert(valueLength >= 0); if (valueLength == 0)
return 0; // A zero-length sequence is always treated as "found" at the start of the search space. int valueTailLength = valueLength - 1;
if (valueTailLength == 0)
{
// for single-char values use plain IndexOf
return IndexOfChar(ref searchSpace, value, searchSpaceLength);
} nint offset = 0;
char valueHead = value;
int searchSpaceMinusValueTailLength = searchSpaceLength - valueTailLength;
if (Vector128.IsHardwareAccelerated && searchSpaceMinusValueTailLength >= Vector128<ushort>.Count)
{
goto SEARCH_TWO_CHARS;
} ref byte valueTail = ref Unsafe.As<char, byte>(ref Unsafe.Add(ref value, 1));
int remainingSearchSpaceLength = searchSpaceMinusValueTailLength; while (remainingSearchSpaceLength > 0)
{
// Do a quick search for the first element of "value".
// Using the non-packed variant as the input is short and would not benefit from the packed implementation.
int relativeIndex = NonPackedIndexOfChar(ref Unsafe.Add(ref searchSpace, offset), valueHead, remainingSearchSpaceLength);
if (relativeIndex < 0)
break; remainingSearchSpaceLength -= relativeIndex;
offset += relativeIndex; if (remainingSearchSpaceLength <= 0)
break; // The unsearched portion is now shorter than the sequence we're looking for. So it can't be there. // Found the first element of "value". See if the tail matches.
if (SequenceEqual(
ref Unsafe.As<char, byte>(ref Unsafe.Add(ref searchSpace, offset + 1)),
ref valueTail,
(nuint)(uint)valueTailLength * 2))
{
return (int)offset; // The tail matched. Return a successful find.
} remainingSearchSpaceLength--;
offset++;
}
return -1; // Based on http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd "Algorithm 1: Generic SIMD" by Wojciech Mula
// Some details about the implementation can also be found in https://github.com/dotnet/runtime/pull/63285
SEARCH_TWO_CHARS:
if (Vector512.IsHardwareAccelerated && searchSpaceMinusValueTailLength - Vector512<ushort>.Count >= 0)
{
// Find the last unique (which is not equal to ch1) character
// the algorithm is fine if both are equal, just a little bit less efficient
ushort ch2Val = Unsafe.Add(ref value, valueTailLength);
nint ch1ch2Distance = (nint)(uint)valueTailLength;
while (ch2Val == valueHead && ch1ch2Distance > 1)
ch2Val = Unsafe.Add(ref value, --ch1ch2Distance); Vector512<ushort> ch1 = Vector512.Create((ushort)valueHead);
Vector512<ushort> ch2 = Vector512.Create(ch2Val); nint searchSpaceMinusValueTailLengthAndVector =
searchSpaceMinusValueTailLength - (nint)Vector512<ushort>.Count; do
{
// Make sure we don't go out of bounds
Debug.Assert(offset + ch1ch2Distance + Vector512<ushort>.Count <= searchSpaceLength); Vector512<ushort> cmpCh2 = Vector512.Equals(ch2, Vector512.LoadUnsafe(ref searchSpace, (nuint)(offset + ch1ch2Distance)));
Vector512<ushort> cmpCh1 = Vector512.Equals(ch1, Vector512.LoadUnsafe(ref searchSpace, (nuint)offset));
Vector512<byte> cmpAnd = (cmpCh1 & cmpCh2).AsByte(); // Early out: cmpAnd is all zeros
if (cmpAnd != Vector512<byte>.Zero)
{
goto CANDIDATE_FOUND;
} LOOP_FOOTER:
offset += Vector512<ushort>.Count; if (offset == searchSpaceMinusValueTailLength)
return -1; // Overlap with the current chunk for trailing elements
if (offset > searchSpaceMinusValueTailLengthAndVector)
offset = searchSpaceMinusValueTailLengthAndVector; continue;
接口抽象
接下来尝试抽象
public interface IReaderBuffer<T> : IDisposable where T : struct
{
public int ConsumedCount { get; }
public int Index { get; }
public ReadOnlySpan<T> Readed { get; }
public bool IsEOF { get; }
/// 标记已读, 以方便释放空间
public void Consume(int count);
/// 不同场景可以预览不同数组数据, 要求使用方法 就可以在预览未读取数据时将数据读取到数组中
public bool Peek(int count, out ReadOnlySpan<T> data);
public bool Peek(out T data);
public bool PeekByOffset(int offset, out T data);
/// 读取下一份数据
public bool ReadNextBuffer(int count);
}
/// 此接口用于表明 固定长度的类型, 以便于我们可以做性能优化
public interface IFixedReaderBuffer<T> : IReaderBuffer<T> where T : struct
{
}
String 对应buffer 实现
非常简单,基本就是string 的直接方法
public class StringReaderBuffer : IFixedReaderBuffer<char>
{
internal string _buffer;
internal int _offset;
internal int _consumedCount;
public StringReaderBuffer(string content)
{
_buffer = content;
}
public ReadOnlySpan<char> Readed
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => _buffer.AsSpan(_offset);
}
public bool IsEOF
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => _offset == _buffer.Length;
}
public int ConsumedCount
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => _consumedCount;
}
public int Index
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => _offset;
}
public void Consume(int count)
{
_offset += count;
_consumedCount += count;
}
public void Dispose()
{
}
public bool Peek(int count, out ReadOnlySpan<char> data)
{
if (_offset + count > _buffer.Length)
{
data = default;
return false;
}
data = _buffer.AsSpan(_offset, count);
return true;
}
public bool Peek(out char data)
{
if (_offset >= _buffer.Length)
{
data = default;
return false;
}
data = _buffer[_offset];
return true;
}
public bool PeekByOffset(int offset, out char data)
{
var o = _offset + offset;
if (o >= _buffer.Length)
{
data = default;
return false;
}
data = _buffer[o];
return true;
}
public bool ReadNextBuffer(int count) => false;
}
TextReader 对 buffer 实现
这里使用对 TextReader 封装,主要考虑到避免 字符编码 的复杂度
该实现参考自 System.Text.Json
内 ReadBufferState
不一定是最优方式(欢迎大家提供更优秀方式)
public class TextReaderBuffer : IReaderBuffer<char>
{
internal char[] _buffer;
internal int _offset;
internal int _count;
internal int _maxCount;
internal int _consumedCount;
private TextReader _reader;
private bool _isFinalBlock;
private bool _isReaded;
public ReadOnlySpan<char> Readed
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get
{
if (!_isReaded)
{
ReadNextBuffer(1);
_isReaded = true;
}
return _buffer.AsSpan(_offset, _count - _offset);
}
}
public bool IsEOF
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => _isFinalBlock && _offset == _count;
}
public int ConsumedCount
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => _consumedCount;
}
public int Index
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => _offset;
}
public TextReaderBuffer(TextReader reader, int initialBufferSize)
{
if (initialBufferSize <= 0)
{
initialBufferSize = 256;
}
_buffer = ArrayPool<char>.Shared.Rent(initialBufferSize);
_consumedCount = _count = _offset = 0;
_reader = reader;
}
public void Consume(int count)
{
_offset += count;
_consumedCount += count;
}
/// 调整buffer 数组大小,以便能更有效多读取数据,减少数据迁移带来的数组操作
public void AdvanceBuffer(int count)
{
var remaining = _buffer.Length - _count + _offset;
if (remaining <= (_buffer.Length / 2) && _buffer.Length != int.MaxValue)
{
// We have less than half the buffer available, double the buffer size.
char[] oldBuffer = _buffer;
int oldMaxCount = _maxCount;
var newSize = (_buffer.Length < (int.MaxValue / 2)) ? _buffer.Length * 2 : int.MaxValue;
while (newSize < count)
{
newSize *= (newSize < (int.MaxValue / 2)) ? newSize * 2 : int.MaxValue;
}
char[] newBuffer = ArrayPool<char>.Shared.Rent(newSize);
// Copy the unprocessed data to the new buffer while shifting the processed bytes.
Buffer.BlockCopy(oldBuffer, _offset, newBuffer, 0, _count - _offset);
_buffer = newBuffer;
// Clear and return the old buffer
new Span<char>(oldBuffer, 0, oldMaxCount).Clear();
ArrayPool<char>.Shared.Return(oldBuffer);
_maxCount = _count;
_count -= _offset;
_offset = 0;
}
else if (_offset != 0)
{
_count -= _offset;
// Shift the processed bytes to the beginning of buffer to make more room.
Buffer.BlockCopy(_buffer, _offset, _buffer, 0, _count);
_offset = 0;
}
}
public void Dispose()
{
if (_buffer != null)
{
new Span<char>(_buffer, 0, _maxCount).Clear();
char[] toReturn = _buffer;
ArrayPool<char>.Shared.Return(toReturn);
_buffer = null!;
}
}
public bool Peek(int count, out ReadOnlySpan<char> data)
{
if (!_isReaded)
{
ReadNextBuffer(count);
_isReaded = true;
}
if (!_isFinalBlock && count + _offset > _count)
{
ReadNextBuffer(count);
}
if (_offset + count > _count)
{
data = default;
return false;
}
data = _buffer.AsSpan(_offset, count);
return true;
}
public bool Peek(out char data)
{
if (!_isReaded)
{
ReadNextBuffer(1);
_isReaded = true;
}
if (!_isFinalBlock && 1 + _offset > _count)
{
ReadNextBuffer(1);
}
if (_offset >= _count)
{
data = default;
return false;
}
data = _buffer[_offset];
return true;
}
public bool PeekByOffset(int offset, out char data)
{
var o = offset + 1;
if (!_isReaded)
{
ReadNextBuffer(o);
_isReaded = true;
}
if (!_isFinalBlock && o > _count)
{
ReadNextBuffer(o);
}
if (_offset >= _count)
{
data = default;
return false;
}
data = _buffer[o];
return true;
}
public bool ReadNextBuffer(int count)
{
if (!_isFinalBlock)
{
AdvanceBuffer(count);
do
{
int readCount = _reader.Read(_buffer.AsSpan(_count));
if (readCount == 0)
{
_isFinalBlock = true;
break;
}
_count += readCount;
}
while (_count < _buffer.Length);
if (_count > _maxCount)
{
_maxCount = _count;
}
return true;
}
return false;
}
}
RFC4180 csv标准 解析实现
PS: 不一定完全正确,毕竟没有完整测试过,仅供参考,哈哈
可以看到,由于要考虑不确定长度的抽象, 代码还是有一定复杂度的
public class CsvReader : TextDataReader<string[]>
{
public CsvReader(string content, char separater = ',', bool fristIsHeader = false) : base(content)
{
Separater = separater;
HasHeader = fristIsHeader;
}
public CsvReader(TextReader reader, int bufferSize = 256, char separater = ',', bool fristIsHeader = false) : base(reader, bufferSize)
{
Separater = separater;
HasHeader = fristIsHeader;
}
public char Separater { get; private set; } = ',';
public bool HasHeader { get; private set; }
public string[] Header { get; private set; }
public int FieldCount { get; private set; }
public override bool MoveNext()
{
string[] row;
if (HasHeader && Header == null)
{
if (!ProcessFirstRow(out row))
{
throw new ParseException("Missing header");
}
Header = row;
}
var r = FieldCount == 0 ? ProcessFirstRow(out row) : ProcessRow(out row);
Current = row;
return r;
}
private bool ProcessFirstRow(out string[]? row)
{
var r = new List<string>();
var hasValue = false;
while (ProcessField(out var f))
{
r.Add(f);
hasValue = true;
}
reader.IngoreCRLF();
row = r.ToArray();
FieldCount = row.Length;
return hasValue;
}
private bool TakeString(out string s)
{
if (reader.IsEOF)
{
throw new ParseException($"Expect some string end with '\"' at {reader.Index} but got eof");
}
int pos = 0;
int len;
ReadOnlySpan<char> remaining;
do
{
remaining = reader.Readed;
len = remaining.Length;
var charBufferSpan = remaining[pos..];
var i = charBufferSpan.IndexOf(Separater);
if (i >= 0)
{
if (reader.PeekByOffset(i + 1, out var n) && n == Separater)
{
pos += i + 2;
continue;
}
s = remaining[..i].ToString();
reader.Consume(i + 1);
return true;
}
else
{
pos += charBufferSpan.Length;
}
} while (reader.ReadNextBuffer(len));
s = reader.Readed.ToString();
return true;
}
private bool ProcessField(out string? f)
{
if (!reader.Peek(out var c) || reader.IngoreCRLF())
{
f = null;
return false;
}
if (c == Separater)
{
f = string.Empty;
reader.Consume(1);
return true;
}
else if (c is '"')
{
/// 读取可能转义的字段数据
reader.Consume(1);
return TakeString(out f);
}
else
{
/// 读取不包含转义的普通字段数据
var i = reader.IndexOfAny(Separater, '\r', '\n');
if (i == 0)
{
f = string.Empty;
}
else if (i > 0)
{
f = reader.Readed[..i].ToString();
reader.Consume(i);
}
else
{
f = reader.Readed.ToString();
reader.Consume(f.Length);
}
if (reader.Peek(out var cc) && cc == Separater)
{
reader.Consume(1);
}
return true;
}
}
private bool ProcessRow(out string[]? row)
{
row = new string[FieldCount];
for (int i = 0; i < FieldCount; i++)
{
if (!ProcessField(out var f))
{
reader.IngoreCRLF();
return false;
}
row[i] = f;
}
reader.IngoreCRLF();
return true;
}
}
至于其性能,就是最顶上的结果
达到了预期,不算浪费秃头掉发了
完整代码参考 https://github.com/fs7744/ruqu
以解析csv数据为例,讨论string、char[]、stream 不同类型来源是否能进行高性能读取解析封装可能性的更多相关文章
- 解析csv数据绘制曲线图
一个解析csv数据的小工具,所做项目中要查看脉冲图谱,经理就让我这个刚入职的小萌新写了个小程序.同事将csv格式的脉冲数据发给我,我的想法就是,将这些csv里的数据作为纵轴,x++为横轴,绘制出折线图 ...
- 解析csv数据导入mysql的方法
mysql自己有个csv引擎,可以通过这个引擎来实现将csv中的数据导入到mysql数据库中,并且速度比通过php或是python写的批处理程序快的多. 具体的实现代码示例: 代码如下: load d ...
- Python 解析构建数据大杂烩 -- csv、xml、json、excel
Python 可以通过各种库去解析我们常见的数据.其中 csv 文件以纯文本形式存储表格数据,以某字符作为分隔值,通常为逗号:xml 可拓展标记语言,很像超文本标记语言 Html ,但主要对文档和数据 ...
- 使用jQuery解析JSON数据
我们先以解析上例中的comments对象的JSON数据为例,然后再小结jQuery中解析JSON数据的方法. 上例中得到的JSON数据如下,是一个嵌套JSON: {"comments&quo ...
- Android网络之数据解析----SAX方式解析XML数据
[声明] 欢迎转载,但请保留文章原始出处→_→ 生命壹号:http://www.cnblogs.com/smyhvae/ 文章来源:http://www.cnblogs.com/smyhvae/p/ ...
- 使用jQuery解析JSON数据(由ajax发送请求到php文件处理数据返回json数据,然后解析json写入html中呈现)
在上一篇的Struts2之ajax初析中,我们得到了comments对象的JSON数据,在本篇中,我们将使用jQuery进行数据解析. 我们先以解析上例中的comments对象的JSON数据为例,然后 ...
- Java构造和解析Json数据的两种方法详解一
一.介绍 JSON-lib包是一个beans,collections,maps,java arrays 和XML和JSON互相转换的包,主要就是用来解析Json数据,在其官网http://www.js ...
- (四)SAX方式解析XML数据
SAX方式解析XML数据 文章来源:http://www.cnblogs.com/smyhvae/p/4044170.html 一.XML和Json数据的引入: 通常情况下,每个需要访问网络的应用程 ...
- wIndows phone 7 解析Html数据
原文:wIndows phone 7 解析Html数据 在我的上一篇文章中我介绍了windows phone 7的gb2312解码, http://www.cnblogs.com/qingci/arc ...
- 【Android Developers Training】 81. 解析XML数据
注:本文翻译自Google官方的Android Developers Training文档,译者技术一般,由于喜爱安卓而产生了翻译的念头,纯属个人兴趣爱好. 原文链接:http://developer ...
随机推荐
- [转帖]CPU的制造和概念
https://plantegg.github.io/2021/06/01/CPU%E7%9A%84%E5%88%B6%E9%80%A0%E5%92%8C%E6%A6%82%E5%BF%B5/ 为了让 ...
- Chrome 下载地址
今天同事找到一个网页 感觉非常好用 这里保存并且推荐一下 https://www.chromedownloads.net/chrome64win-stable/
- 【Go WEB进阶实战】开源的电商前后台API系统
前言 最近有很多小伙伴私信我:在学完Go基础后,想使用一个框架实战一个商业项目,但是又苦于不知道选择什么框架,更不知道做什么商业项目. 为了解决大家这些问题,我结合自己的项目经历,为大家开源了一个简单 ...
- 京东ES支持ZSTD压缩算法上线了:高性能,低成本 | 京东云技术团队
1 前言 在<ElasticSearch降本增效常见的方法>一文中曾提到过zstd压缩算法[1],一步一个脚印我们终于在京东ES上线支持了zstd:我觉得促使目标完成主要以下几点原因: ...
- ChatGPT背后的AI背景、技术门道和商业应用(万字长文,建议收藏)
作者:京东科技 李俊兵 各位看官好,我是球神(江湖代号). 自去年11月30日ChatGPT问世以来,迅速爆火出圈. 起初我依然以为这是和当年Transformer, Bert一样的"热点& ...
- 基于密码学的身份混淆系统 -- idmix
简介 Hyperledger Fabric的Idemix是一个基于密码学的身份混淆系统,它提供了一种在区块链网络中实现用户隐私的方法.Idemix的主要特性是它的零知识证明系统,这是一种允许用户证明他 ...
- python实现zip分卷压缩与解压
1. python实现zip分卷压缩 WinHex 开始16进制一个一个文件对比 WinRar 创建的分卷压缩和单个 zip 文件的差异. 如果想把单个大文件 test.zip -> 分卷文件 ...
- Linux的进程管理 [补档-2023-07-25]
Linux进程管理 9-1并发与并行: 并发:在同一个cpu上,并且在一个时间段时,同时运行多个程序.比如在1000毫秒内,我们有5个程 序需要执行,所以我们可以将1000毫秒分为5个200毫秒, ...
- 《Mybatis 手撸专栏》第2章:创建简单的映射器代理工厂
作者:小傅哥 博客:https://bugstack.cn 沉淀.分享.成长,让自己和他人都能有所收获! 一.前言 着急和快,是最大的障碍! 慢下来,慢下来,只有慢下来,你才能看到更全的信息,才能学到 ...
- LLM面面观之RLHF平替算法DPO
1. 背景 最近本qiang~老看到一些关于大语言模型的DPO.RLHF算法,但都有些云里雾里,因此静下心来收集资料.研读论文,并执行了下开源代码,以便加深印象. 此文是本qiang~针对大语言模型的 ...