我做的FFmpeg开源C#封装库Sdcb.FFmpeg

写在前面：

该主题为2022年12月份.NET Conf China 2022我的主题，项目地址：https://github.com/sdcb/Sdcb.FFmpeg

对应的PPT可以从这下载：https://io.starworks.cc:88/cv-public/2022/.NET玩转音视频操作FFmpeg.pptx

对应的视频可以从这里观看（从3:19:00开始）：https://bbs.csdn.net/topics/609897502

FFmpeg是知名的音频视频处理软件，我平时工作生活中会经常用到。但同时我也是.NET程序员，在尝试性的用C#调用FFmpeg时，有以下这些选择：

进程外调用，比如：
- FFmpeg.NET
- MediaToolkit
- Xabe.Ffmpeg
基于C API平台调用，比如：
- FFmpeg.AutoGen
- EmguFFmpeg
- Sdcb.FFmpeg

如果基于命令行的话，有以下优缺点：

优点：容易学习、入门方便、不与GPL开源协议冲突
基于进程互操作，依赖于标准流重定向管理状态
输入和输出依赖于文件，很难精细控制

如果是基于C API做平台调用，则可以很好解决上面一些问题，有如下优缺点：

输入和输出可基于内存，可精细控制每一帧
性能方面减少了跨进程的损耗，更能有保障
缺点：C API代码比较复杂
缺点：业界普遍使用FFmpeg.AutoGen，在C#的基础上糅合C指针，写起来甚至比C API更复杂

我做了什么？

受制于以上这些困难，我以业界普遍使用的开源项目FFmpeg.AutoGen为基础，我我自己动手做了一个Sdcb.FFmpeg，它有如下优点：

保留所有直接调用C API的能力、保留跨平台的能力
删掉并完全重写了ClangMacroParser依赖，因此比原版支持更多的宏解析
动态库加载方式从手动LoadLibrary改为了自动的[DllImport]，这在.NET Core中可以自动从NuGet包中加载dll，这更符合.NET社区共识
删掉了仓库所有大二进制依赖和大二进制历史，改成自动从网上下载，这缩小了仓库体积
简化了枚举名字，如AVCodecID.AV_CODEC_ID_H264 -> AVCodecID.H264
为许多C宏改造成了C#枚举，如ffmpeg.AV_DICT_MATCH_CASE -> AV_DICT_READ.MatchCase
除了底层封装，还提供了中层（类）封装和高层（帮助类）封装，比如CodecContext和MediaDictionary
我制作了动态链接库的NuGet包，这可以保障程序不需要安装外部依赖直接就能运行

NuGet包列表

FFmpeg 5.x:

Package Link

Sdcb.FFmpeg

Sdcb.FFmpeg.runtime.windows-x64
FFmpeg 4.4.x:

Package Link

Sdcb.FFmpeg

Sdcb.FFmpeg.runtime.windows-x64

Package	Link
Sdcb.FFmpeg
Sdcb.FFmpeg.runtime.windows-x64

Package	Link
Sdcb.FFmpeg
Sdcb.FFmpeg.runtime.windows-x64

Linux/MacOS下如何使用？

Linux下你并不需要这些NuGet包，Linux的发行版本很多，这些发行版大都内置了FFmpeg这样非常常见的库，比如在Ubuntu 22.04中，就可以通过如下命令来安装FFmpeg 5.x的动态链接库：

apt update

apt install software-properties-common

add-apt-repository ppa:savoury1/ffmpeg4 -y

add-apt-repository ppa:savoury1/ffmpeg5 -y

apt update

apt install ffmpeg -y

如果是FFmpeg 4.x，则可以通过以下命令来安装动态链接库：

apt update

apt install software-properties-common

add-apt-repository ppa:savoury1/ffmpeg4 -y

apt update

apt install ffmpeg -y

如果是MacOS，则可以通过以下命令来安装动态链接库：

brew install ffmpeg

NuGet包一般会和libc相关的库绑定，没有很好的泛用性，而且一般Linux中有更好的解决方案，因此我没有为Linux制作运行时NuGet包。

但不要理解错了，Sdcb.FFmpeg在Linux中也是经过测试的，也运行得很好，Github Actions测试链接：https://github.com/sdcb/Sdcb.FFmpeg/actions

为什么我要另起炉灶？

其实我并不是一上来就准备另起炉灶，一开始我受到北京大佬于宏伟这个EmguFFmpeg项目的启发，觉得FFmpeg.AutoGen确实很难用，但只要依赖于FFmpeg.AutoGen，稍做点封装，就能减少许多维护工作，为此我于2020~2021年一直在想办法开发和维护这个开源项目：Sdcb.FFmpegAPIWrapper，这个项目是完全基于Sdcb.FFmpeg开发的，当时这个项目也已经基本完成（就是没怎么做宣传、示例和教程）。

然而随着项目的深入，我越来越觉得直接依赖于FFmpeg.AutoGen会导致代码过于“笨重”，比如同一套东西，原始的和“高级”的有两种不同的写法（比如同时存在AVCodecID.AV_CODEC_ID_H264和AVCodecID.H264，用户大概率会迷失，因此经过了许久的迷茫期后我终于下定决心改造FFmpeg.AutoGen，整个改造的过程伴随了大约一年的时间，最后就造就了今天的状态。

6个示例演示Sdcb.FFmpeg

示例1 纯代码生成视频

可以理解这个示例是FFmpeg的“Hello World”，需要引用如下NuGet包：

Sdcb.FFmpeg 5.1.2
Sdcb.FFmpeg.runtime.windows-x64

需要引用以下名字空间：

Sdcb.FFmpeg.Codecs
Sdcb.FFmpeg.Formats
Sdcb.FFmpeg.Raw
Sdcb.FFmpeg.Toolboxs.Extensions
Sdcb.FFmpeg.Toolboxs.Generators
Sdcb.FFmpeg.Utils

完整代码如下（点击展开）：

// this example is based on Sdcb.FFmpeg 5.1.2

FFmpegLogger.LogWriter = (level, msg) => Console.Write(msg);

using FormatContext fc = FormatContext.AllocOutput(formatName: "mp4");

fc.VideoCodec = Codec.CommonEncoders.Libx264;

MediaStream vstream = fc.NewStream(fc.VideoCodec);

using CodecContext vcodec = new CodecContext(fc.VideoCodec)

{

    Width = 800,

    Height = 600,

    TimeBase = new AVRational(1, 30),

    PixelFormat = AVPixelFormat.Yuv420p,

    Flags = AV_CODEC_FLAG.GlobalHeader,

};

vcodec.Open(fc.VideoCodec);

vstream.Codecpar!.CopyFrom(vcodec);

vstream.TimeBase = vcodec.TimeBase;

string outputPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "muxing.mp4");

fc.DumpFormat(streamIndex: 0, outputPath, isOutput: true);

using IOContext io = IOContext.OpenWrite(outputPath);

fc.Pb = io;

fc.WriteHeader();

VideoFrameGenerator.Yuv420pSequence(vcodec.Width, vcodec.Height, 600)

	.ConvertFrames(vcodec)

	.EncodeAllFrames(fc, null, vcodec)

	.WriteAll(fc);

fc.WriteTrailer();

运行后应该可以在桌面上看到一个muxing.mp4的文件，这个文件就是通过上述代码生成的，这个视频效果如下图所示：

值得一提的是，我写了VideoFrameGenerator.Yuv420pSequence，它输入了少量参数，返回了IEnumerable<Frame>（或者在其它示例中IEnumerable<Packet>），这是我项目里面非常常见的写法，这样既体现了C#语言简明强大的魅力，又其实保障了资源管理和内存释放。

示例2 压制视频

这个示例将展示如何将一个视频压制成如下参数，这些参数也是微信Windows桌面端视频不受二压的参数：

编码：H264
视频码率：600kbps以下
视频分辨率：未限制，但推荐长边960
音频编码：AAC
音频码率：48kbps

需要引用如下NuGet包：

Sdcb.FFmpeg 5.1.2
Sdcb.FFmpeg.runtime.windows-x64

需要引用如下名字空间：

Sdcb.FFmpeg.Codecs
Sdcb.FFmpeg.Common
Sdcb.FFmpeg.Filters
Sdcb.FFmpeg.Formats
Sdcb.FFmpeg.Raw
Sdcb.FFmpeg.Toolboxs
Sdcb.FFmpeg.Toolboxs.Extensions
Sdcb.FFmpeg.Toolboxs.FilterTools
Sdcb.FFmpeg.Toolboxs.Generators
Sdcb.FFmpeg.Utils
static Sdcb.FFmpeg.Raw.ffmpeg
System.Collections.Concurrent
System.Runtime.CompilerServices
System.Threading.Tasks

完整代码如下（点击展开）：

void Main()

{

	FFmpegLogger.LogLevel = LogLevel.Error;

	FFmpegLogger.LogWriter = (level, msg) => Console.Write(msg);

	Task.Run(() => A7r3VideoToWechat(@"Y:\a7r3\2022-12-12\C0060.MP4")).Wait();

}

void A7r3VideoToWechat(string mp4Path)

{

	using FormatContext inFc = FormatContext.OpenInputUrl(mp4Path);

	inFc.LoadStreamInfo();

	// prepare input stream/codec

	MediaStream inAudioStream = inFc.GetAudioStream();

	using CodecContext audioDecoder = new(Codec.FindDecoderById(inAudioStream.Codecpar!.CodecId));

	audioDecoder.FillParameters(inAudioStream.Codecpar);

	audioDecoder.Open();

	audioDecoder.ChannelLayout = (ulong)ffmpeg.av_get_default_channel_layout(audioDecoder.Channels);

	MediaStream inVideoStream = inFc.GetVideoStream();

	using CodecContext videoDecoder = new(Codec.FindDecoderByName("h264_cuvid"));

	videoDecoder.FillParameters(inVideoStream.Codecpar!);

	videoDecoder.Open();

	// dest file

	string destFile = Path.Combine(Path.GetDirectoryName(mp4Path)!, Path.GetFileNameWithoutExtension(mp4Path) + "_wechat.mp4");

	using FormatContext outFc = FormatContext.AllocOutput(fileName: destFile);

	// dest encoder and streams

	outFc.AudioCodec = Codec.CommonEncoders.AAC;

	MediaStream outAudioStream = outFc.NewStream(outFc.AudioCodec);

	using CodecContext audioEncoder = new(outFc.AudioCodec)

	{

		Channels = 1,

		SampleFormat = outFc.AudioCodec.Value.NegociateSampleFormat(AVSampleFormat.Fltp),

		SampleRate = outFc.AudioCodec.Value.NegociateSampleRates(48000),

		BitRate = 48000

	};

	audioEncoder.ChannelLayout = (ulong)ffmpeg.av_get_default_channel_layout(audioEncoder.Channels);

	audioEncoder.TimeBase = new AVRational(1, audioEncoder.SampleRate);

	audioEncoder.Open(outFc.AudioCodec);

	outAudioStream.Codecpar!.CopyFrom(audioEncoder);

	outFc.VideoCodec = Codec.FindEncoderByName("libx264");

	MediaStream outVideoStream = outFc.NewStream(outFc.VideoCodec);

	using VideoFilterContext vfilter = VideoFilterContext.Create(inVideoStream, "scale=1920:-1");

	using CodecContext videoEncoder = new(outFc.VideoCodec)

	{

		Flags = AV_CODEC_FLAG.GlobalHeader,

		ThreadCount = Environment.ProcessorCount,

		ThreadType = ffmpeg.FF_THREAD_FRAME,

		BitRate = 595_000

	};

	vfilter.ConfigureEncoder(videoEncoder);

	var dict = new MediaDictionary

	{

		//["qp"] = "30",

		["tune"] = "zerolatency",

		["preset"] = "veryfast"

	};

	videoEncoder.Open(outFc.VideoCodec, dict);

	//dict.Dump();

	outVideoStream.Codecpar!.CopyFrom(videoEncoder);

	outVideoStream.TimeBase = videoEncoder.TimeBase;

	// begin write

	using IOContext io = IOContext.OpenWrite(destFile);

	outFc.Pb = io;

	outFc.WriteHeader();

	MediaThreadQueue<Frame> decodingQueue = inFc

		.ReadPackets(inVideoStream.Index, inAudioStream.Index)

		.DecodeAllPackets(inFc, audioDecoder, videoDecoder)

		.ToThreadQueue(cancellationToken: QueryCancelToken, boundedCapacity: 64);

	MediaThreadQueue<Packet> encodingQueue = decodingQueue.GetConsumingEnumerable()

		.ApplyVideoFilters(vfilter)

		.ConvertAllFrames(audioEncoder, videoEncoder)

		.AudioFifo(audioEncoder)

		.EncodeAllFrames(outFc, audioEncoder, videoEncoder)

		.ToThreadQueue(cancellationToken: QueryCancelToken);

	CancellationTokenSource end = new();

	QueryCancelToken.Register(() => end.Cancel());

	Dictionary<int, PtsDts> ptsDts = new();

	Task.Run(async () =>

	{

		double totalDuration = Math.Max(inVideoStream.GetDurationInSeconds(), inAudioStream.GetDurationInSeconds());

		try

		{

			while (!end.IsCancellationRequested)

			{

				Log();

				await Task.Delay(1000, end.Token);

			}

		}

		finally

		{

			Log();

		}

		void Log() => Console.WriteLine($"{GetStatusText()}, dec/enc queue: {decodingQueue.Count}/{encodingQueue.Count}");

		string GetStatusText() => $"{(outVideoStream.TimeBase * ptsDts.GetValueOrDefault(outVideoStream.Index, PtsDts.Default).Dts).ToDouble():F2} of {totalDuration:F2}";

	});

	encodingQueue.GetConsumingEnumerable()

		.RecordPtsDts(ptsDts)

		.WriteAll(outFc);

	end.Cancel();

	outFc.WriteTrailer();

}

运行效果如图（将500多MB压缩为5MB）：

值得一提的是这里的MediaThreadQueue<Frame>和MediaThreadQueue<Packet>，内部都是基于C#的BlockingCollection加多线程做的，这样可能提高效率，保证性能。

示例3 创建gif（表情包？）

注意，我创建了一个demo网站可以用于演示该功能，可以点击“生成”按钮，比如可以得到这样的表情包：

我把所有有完整Visual Studio代码示例上传到了Github，可以在这下载：https://github.com/sdcb/ffmpeg-wjz-sorry-generator

它有如下步骤和要点：

视频解码
将每一帧转换为BGRA像素格式
使用Direct2D读取并绘制字幕
将每一帧输入视频过滤器，转换为PAL8格式
将PAL8编码像素格式的帧编码为gif

注意这个demo我用到了Direct2D，它基于这个开源项目做的：Vortice.Windows

示例4 实际桌面投屏（远程桌面？）

这个可以实现将一台电脑的屏幕内容，以较低的网络开销，通过网络实时地传输到另一台电脑，它的使用场景包含实时视频通话、远程投屏、远程桌面控制等。

代码分为两部分，桌面录制-编码-发送端和远程接收-解码-显示端。

桌面录制-编码-发送端完整源代码

需要引用NuGet包：

Sdcb.FFmpeg 4.4.3
Sdcb.FFmpeg.runtime.windows-x64 4.4.3
Sdcb.ScreenCapture

完整源代码如下（点击展开）：

// This example was initially written based on Sdcb.FFmpeg 4.4.3 & Sdcb.ScreenCapture

void Main()

{

	StartService(QueryCancelToken);

}

void StartService(CancellationToken cancellationToken = default)

{

	var tcpListener = new TcpListener(IPAddress.Any, 5555);

	cancellationToken.Register(() => tcpListener.Stop());

	tcpListener.Start();

	while (!cancellationToken.IsCancellationRequested)

	{

		TcpClient client = tcpListener.AcceptTcpClient();

		Task.Run(() => ServeClient(client, cancellationToken));

	}

}

void ServeClient(TcpClient tcpClient, CancellationToken cancellationToken = default)

{

	try

	{

		using var _ = tcpClient;

		using NetworkStream stream = tcpClient.GetStream();

		using BinaryWriter writer = new(stream);

		RectI screenSize = ScreenCapture.GetScreenSize(screenId: 0);

		RdpCodecParameter rcp = new(AVCodecID.H264, screenSize.Width, screenSize.Height, AVPixelFormat.Bgr0);

		using CodecContext cc = new(Codec.CommonEncoders.Libx264RGB)

		{

			Width = rcp.Width,

			Height = rcp.Height,

			PixelFormat = rcp.PixelFormat,

			TimeBase = new AVRational(1, 20),

		};

		cc.Open(null, new MediaDictionary

		{

			["crf"] = "30",

			["tune"] = "zerolatency",

			["preset"] = "veryfast"

		});

		writer.Write(rcp.ToArray());

		using Frame source = new();

		foreach (Packet packet in ScreenCapture

			.CaptureScreenFrames(screenId: 0)

			.ToBgraFrame()

			.ConvertFrames(cc)

			.EncodeFrames(cc))

		{

			if (cancellationToken.IsCancellationRequested)

			{

				break;

			}

			writer.Write(packet.Data.Length);

			writer.Write(packet.Data.AsSpan());

		}

	}

	catch (IOException ex)

	{

		// Unable to write data to the transport connection: 远程主机强迫关闭了一个现有的连接。.

		// Unable to write data to the transport connection: 你的主机中的软件中止了一个已建立的连接。

		ex.Dump();

	}

}

public class Filo<T> : IDisposable

{

	private T? Item { get; set; }

	private ManualResetEventSlim Notify { get; } = new ManualResetEventSlim();

	public void Update(T item)

	{

		Item = item;

		Notify.Set();

	}

	public IEnumerable<T> Consume(CancellationToken cancellationToken = default)

	{

		while (!cancellationToken.IsCancellationRequested)

		{

			Notify.Wait(cancellationToken);

			yield return Item!;

		}

	}

	public void Dispose() => Notify.Dispose();

}

public static class BgraFrameExtensions

{

	public static IEnumerable<Frame> ToBgraFrame(this IEnumerable<LockedBgraFrame> bgras)

	{

		using Frame frame = new Frame();

		foreach (LockedBgraFrame bgra in bgras)

		{

			frame.Width = bgra.Width;

			frame.Height = bgra.Height;

			frame.Format = (int)AVPixelFormat.Bgra;

			frame.Data[0] = bgra.DataPointer;

			frame.Linesize[0] = bgra.RowPitch;

			yield return frame;

		}

	}

}

record RdpCodecParameter(AVCodecID CodecId, int Width, int Height, AVPixelFormat PixelFormat)

{

	public byte[] ToArray()

	{

		byte[] data = new byte[16];

		Span<byte> span = data.AsSpan();

		BinaryPrimitives.WriteInt32LittleEndian(span, (int)CodecId);

		BinaryPrimitives.WriteInt32LittleEndian(span[4..], Width);

		BinaryPrimitives.WriteInt32LittleEndian(span[8..], Height);

		BinaryPrimitives.WriteInt32LittleEndian(span[12..], (int)PixelFormat);

		return data;

	}

}

值得一提的是Sdcb.ScreenCapture这个NuGet包也是我做的，它是基于DXGI的技术，录屏时能做到内存0复制，可以实现每秒60帧录屏且CPU占用率很低。这里挖个坑以后有机会介绍这个开源项目，Github地址如下：https://github.com/sdcb/Sdcb.ScreenCapture

远程接收-解码-显示端完整源代码

需要引用的NuGet包：

Sdcb.FFmpeg 4.4.3
Sdcb.FFmpeg.runtime.windows-x64 4.4.3
FlysEngine.Desktop

请点击展开显示：

// This example was initially written based on Sdcb.FFmpeg 4.4.3 & FlysEngine.Desktop

#nullable enable

ManagedBgraFrame? managedFrame = null;

bool cancel = false;

unsafe void Main()

{

	using RenderWindow w = new();

	w.FormClosed += delegate { cancel = true; };

	Task decodingTask = Task.Run(() => DecodeThread(() => (3840, 2160)));

	w.Draw += (_, ctx) =>

	{

		ctx.Clear(Colors.CornflowerBlue);

		if (managedFrame == null) return;

		ManagedBgraFrame frame = managedFrame.Value;

		fixed (byte* ptr = frame.Data)

		{

			//new System.Drawing.Bitmap(frame.Width, frame.Height, frame.RowPitch, System.Drawing.Imaging.PixelFormat.Format32bppPArgb, (IntPtr)ptr).DumpUnscaled();

			BitmapProperties1 props = new(new PixelFormat(Format.B8G8R8A8_UNorm, Vortice.DCommon.AlphaMode.Premultiplied));

			using ID2D1Bitmap bmp = ctx.CreateBitmap(new SizeI(frame.Width, frame.Height), (IntPtr)ptr, frame.RowPitch, props);

			ctx.UnitMode = UnitMode.Dips;

			ctx.DrawBitmap(bmp, 1.0f, InterpolationMode.NearestNeighbor);

		}

	};

	RenderLoop.Run(w, () => w.Render(1, Vortice.DXGI.PresentFlags.None));

}

async Task DecodeThread(Func<(int width, int height)> sizeAccessor)

{

	using TcpClient client = new TcpClient();

	await client.ConnectAsync(IPAddress.Loopback, 5555);

	using NetworkStream stream = client.GetStream();

	using BinaryReader reader = new(stream);

	RdpCodecParameter rcp = RdpCodecParameter.FromSpan(reader.ReadBytes(16));

	using CodecContext cc = new(Codec.FindDecoderById(rcp.CodecId))

	{

		Width = rcp.Width,

		Height = rcp.Height,

		PixelFormat = rcp.PixelFormat,

	};

	cc.Open(null);

	foreach (var frame in reader

		.ReadPackets()

		.DecodePackets(cc)

		.ConvertVideoFrames(sizeAccessor, AVPixelFormat.Bgra)

		.ToManaged()

		)

	{

		if (cancel) break;

		managedFrame = frame;

	}

}

public static class FramesExtensions

{

	public static IEnumerable<ManagedBgraFrame> ToManaged(this IEnumerable<Frame> bgraFrames, bool unref = true)

	{

		foreach (Frame frame in bgraFrames)

		{

			int rowPitch = frame.Linesize[0];

			int length = rowPitch * frame.Height;

			byte[] buffer = new byte[length];

			Marshal.Copy(frame.Data._0, buffer, 0, length);

			ManagedBgraFrame managed = new(buffer, length, length / frame.Height);

			if (unref) frame.Unref();

			yield return managed;

		}

	}

}

public record struct ManagedBgraFrame(byte[] Data, int Length, int RowPitch)

{

	public int Width => RowPitch / BytePerPixel;

	public int Height => Length / RowPitch;

	public const int BytePerPixel = 4;

}

public static class ReadPacketExtensions

{

	public static IEnumerable<Packet> ReadPackets(this BinaryReader reader)

	{

		using Packet packet = new();

		while (true)

		{

			int packetSize = reader.ReadInt32();

			if (packetSize == 0) yield break;

			byte[] data = reader.ReadBytes(packetSize);

			GCHandle dataHandle = GCHandle.Alloc(data, GCHandleType.Pinned);

			try

			{

				packet.Data = new DataPointer(dataHandle.AddrOfPinnedObject(), packetSize);

				yield return packet;

			}

			finally

			{

				dataHandle.Free();

			}

		}

	}

}

record RdpCodecParameter(AVCodecID CodecId, int Width, int Height, AVPixelFormat PixelFormat)

{

	public static RdpCodecParameter FromSpan(ReadOnlySpan<byte> data)

	{

		return new RdpCodecParameter(

			CodecId: (AVCodecID)BinaryPrimitives.ReadInt32LittleEndian(data),

			Width: BinaryPrimitives.ReadInt32LittleEndian(data[4..]),

			Height: BinaryPrimitives.ReadInt32LittleEndian(data[8..]),

			PixelFormat: (AVPixelFormat)BinaryPrimitives.ReadInt32LittleEndian(data[12..]));

	}

}

两者运行效果如图：

可见传输延迟在0.28秒的样子，这是通过libx264编码通过yuv420p传输的我4k显示器视频，可见可以满足实际网络会议演示、投屏直播、远程控制方面的需求（如果是1080p延迟应该可以更低）。

注意该源代码用上了我自己写的开源Direct2D封装引擎：FlysEngine，你不需要关注它的细节（只需要安装NuGet包即可），但如果你碰巧关注，这里又挖个坑看以后有机会介绍介绍，在这之前只需要知道的是它只对D3D11、DXGI、Direct2D、WIC、DirectWrite做了一层薄薄的封装。

示例5 接收显示RTSP摄像头视频

这个程序依赖于如下NuGet包：

FlysEngine.Desktop
Sdcb.FFmpeg 4.4.3
Sdcb.FFmpeg.runtime.windows-x64 4.4.3

完整代码（点击展开）：

#nullable enable

FFmpegBmp? ffBmp = null;

FFmpegBmp? lastFFbmp = null;

FFmpegLogger.LogWriter = (level, msg) => Console.Write(msg);

CancellationTokenSource cts = new();

using RenderWindow w = new();

Task.Run(() => DecodeRTSP(Util.GetPassword("home-rtsp-ipc"), cts.Token));

w.Draw += (_, ctx) =>

{

	if (ffBmp == null) return;

	if (lastFFbmp == ffBmp) return;

	GCHandle handle = GCHandle.Alloc(ffBmp.Data, GCHandleType.Pinned);

	try

	{

		using ID2D1Bitmap bmp = ctx.CreateBitmap(new SizeI(ffBmp.Width, ffBmp.Height), handle.AddrOfPinnedObject(), ffBmp.RowPitch, new BitmapProperties(new Vortice.DCommon.PixelFormat(Format.B8G8R8A8_UNorm, Vortice.DCommon.AlphaMode.Premultiplied)));

		lastFFbmp = ffBmp;

		Size clientSize = ctx.Size;

		float top = (clientSize.Height - ffBmp.Height) / 2;

		ctx.Transform = Matrix3x2.CreateTranslation(0, top);

		ctx.DrawBitmap(bmp, 1.0f, InterpolationMode.Linear);

	}

	finally

	{

		handle.Free();

	}

};

w.FormClosing += delegate { cts.Cancel(); };

RenderLoop.Run(w, () => w.Render(1, Vortice.DXGI.PresentFlags.None));

void DecodeRTSP(string url, CancellationToken cancellationToken = default)

{

	using FormatContext fc = FormatContext.OpenInputUrl(url);

	fc.LoadStreamInfo();

	MediaStream videoStream = fc.GetVideoStream();

	using CodecContext videoDecoder = new CodecContext(Codec.FindDecoderByName("hevc_qsv"));

	videoDecoder.FillParameters(videoStream.Codecpar!);

	videoDecoder.Open();

	foreach (Frame frame in fc

		.ReadPackets(videoStream.Index)

		.DecodePackets(videoDecoder)

		.ConvertVideoFrames(() => new(w.ClientSize.Width, w.ClientSize.Width * videoDecoder.Height / videoDecoder.Width), AVPixelFormat.Bgr0))

	{

		if (cancellationToken.IsCancellationRequested) break;

		try

		{

			byte[] data = new byte[frame.Linesize[0] * frame.Height];

			Marshal.Copy(frame.Data._0, data, 0, data.Length);

			ffBmp = new FFmpegBmp(frame.Width, frame.Height, frame.Linesize[0], data);

		}

		finally

		{

			frame.Unref();

		}

	}

}

public record FFmpegBmp(int Width, int Height, int RowPitch, byte[] Data);

我农村老家的摄像头使用的是RTSP摄像头，这是使用上述代码的运行效果：

示例6 读RTSP流并保存为mp4/mov文件

这个示例依赖于以下NuGet包：

Sdcb.FFmpeg 4.4.3
Sdcb.FFmpeg.runtime.windows-x64 4.4.3

完整代码示例（请点击展开）：

// The example was initially written using Sdcb.FFmpeg 4.4.3

FFmpegLogger.LogWriter = (level, msg) => Console.Write(msg);

using FormatContext inFc = FormatContext.OpenInputUrl(Util.GetPassword("home-rtsp-ipc"));

inFc.LoadStreamInfo();

MediaStream inAudioStream = inFc.GetAudioStream();

MediaStream inVideoStream = inFc.GetVideoStream();

long gpts_v = 0, gpts_a = 0, gdts_v = 0, gdts_a = 0;

while (!QueryCancelToken.IsCancellationRequested)

{

	using FormatContext outFc = FormatContext.AllocOutput(formatName: "mov");

	string dir = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "rtsp", DateTime.Now.ToString("yyyy-MM-dd"));

	Directory.CreateDirectory(dir);

	using IOContext io = IOContext.OpenWrite(Path.Combine(dir, $"{DateTime.Now:HHmmss}.mov"));

	outFc.Pb = io;

	MediaStream videoStream = outFc.NewStream(Codec.FindEncoderById(inVideoStream.Codecpar!.CodecId));

	videoStream.Codecpar!.CopyFrom(inVideoStream.Codecpar);

	videoStream.TimeBase = inVideoStream.RFrameRate.Inverse();

	videoStream.SampleAspectRatio = inVideoStream.SampleAspectRatio;

	MediaStream audioStream = outFc.NewStream(Codec.FindEncoderById(inAudioStream.Codecpar!.CodecId));

	audioStream.Codecpar!.CopyFrom(inAudioStream.Codecpar);

	audioStream.TimeBase = inAudioStream.TimeBase;

	audioStream.Codecpar.ChannelLayout = (ulong)ffmpeg.av_get_default_channel_layout(inAudioStream.Codecpar.Channels);

	outFc.WriteHeader();

	FilterPackets(inFc.ReadPackets(inAudioStream.Index, inVideoStream.Index), videoFrameCount: 60 * 20)

		.WriteAll(outFc);

	outFc.WriteTrailer();

	IEnumerable<Packet> FilterPackets(IEnumerable<Packet> packets, int videoFrameCount)

	{

		long pts_v = gpts_v, pts_a = gpts_a, dts_v = gdts_v, dts_a = gdts_a;

		long[] buffer = new long[200];

		long ithreshold = -1;

		int videoFrame = 0;

		foreach (Packet pkt in packets)

		{

			pkt.StreamIndex = pkt.StreamIndex == inAudioStream.Index ?

					audioStream.Index :

					videoStream.Index;

			if (pkt.StreamIndex == inAudioStream.Index)

			{

				// audio

				(gpts_a, gdts_a, pkt.Pts, pkt.Dts) = (pkt.Pts, pkt.Dts, pkt.Pts - pts_a, pkt.Dts - dts_a);

				pkt.RescaleTimestamp(inAudioStream.TimeBase, audioStream.TimeBase);

			}

			else

			{

				// video

				if (videoFrame < buffer.Length)

				{

					buffer[videoFrame] = pkt.Data.Length;

					ithreshold = -1;

				}

				else if (videoFrame == buffer.Length)

				{

					ithreshold = buffer.Order().ToArray()[buffer.Length / 2] * 4;

				}

				if (videoFrame >= videoFrameCount && pkt.Data.Length > ithreshold)

				{

					break;

				}

				(gpts_v, gdts_v, pkt.Pts, pkt.Dts) = (pkt.Pts, pkt.Dts, pkt.Pts - pts_v, pkt.Dts - dts_v);

				pkt.RescaleTimestamp(inVideoStream.TimeBase, videoStream.TimeBase);

				videoFrame++;

			}

			yield return pkt;

		}

	}

}

这个程序可以全天候运行，运行后RTSP摄像头录的完整视频和音频，大约每1.5分钟对应一个视频文件，都会保存到桌面的这个文件夹中（如图）：

这样的话也许就有机会取代录机了~

总结与展望

我认为把东西做出来和把东西做好是有区别的，以前在C#里面东西也就是“能用”的状态，这和许多node.js或者python那样的极客玩家有本质区别，希望通过这样一个开源项目能向“.NET作为第一等公民”方向努力。

维护开源不易，喜欢的朋友请点个赞，赏个star：https://github.com/sdcb/Sdcb.FFmpeg

我也想能给自己立个flag，希望未来我可以封装FlyCV、libyuv、x264基于libaom-av1，甚至也许有一点有机会做一个.NET版本的FFmpeg。

喜欢的朋友请关注我的微信公众号：【DotNet骚操作】