半径无关单核单线程最快速高斯模糊实现(附完整C代码)
之前,俺也发过不少快速高斯模糊算法.
俺一般认为,只要处理一千六百万像素彩色图片,在2.2GHz的CPU上单核单线程超过1秒的算法,都是不快的.
之前发的几个算法,在俺2.2GHz的CPU上耗时都会超过1秒.
而众所周知,快速高斯模糊有很多实现方法:
1.FIR (Finite impulse response)
https://zh.wikipedia.org/wiki/%E9%AB%98%E6%96%AF%E6%A8%A1%E7%B3%8A
2.SII (Stacked integral images)
http://dx.doi.org/10.1109/ROBOT.2010.5509400
http://arxiv.org/abs/1107.4958
3.Vliet-Young-Verbeek (Recursive filter)
http://dx.doi.org/10.1016/0165-1684(95)00020-E
http://dx.doi.org/10.1109/ICPR.1998.711192
4.DCT (Discrete Cosine Transform)
http://dx.doi.org/10.1109/78.295213
5.box (Box filter)
http://dx.doi.org/10.1109/TPAMI.1986.4767776
6.AM(Alvarez, Mazorra)
http://www.jstor.org/stable/2158018
7.Deriche (Recursive filter)
http://hal.inria.fr/docs/00/07/47/78/PDF/RR-1893.pdf
8.ebox (Extended Box)
http://dx.doi.org/10.1007/978-3-642-24785-9_38
9.IIR (Infinite Impulse Response)
https://software.intel.com/zh-cn/articles/iir-gaussian-blur-filter-implementation-using-intel-advanced-vector-extensions
10.FA (Fast Anisotropic)
http://mathinfo.univ-reims.fr/IMG/pdf/Fast_Anisotropic_Gquss_Filtering_-_GeusebroekECCV02.pdf
......
实现高斯模糊的方法虽然很多,但是作为算法而言,核心关键是简单高效.
目前俺经过实测,IIR是兼顾效果以及性能的不错的方法,也是半径无关(即模糊不同强度耗时基本不变)的实现.
英特尔官方实现的这份:
IIR Gaussian Blur Filter Implementation using Intel® Advanced Vector Extensions [PDF 513KB]
source: gaussian_blur.cpp [36KB]
采用了英特尔处理器的流(SIMD)指令,算法处理速度极其惊人.
俺写算法追求干净整洁,高效简单,换言之就是不采用任何硬件加速方案,实现简单高效,以适应不同硬件环境.
故基于英特尔这份代码,俺对其进行了改写以及优化.
最终在俺2.20GHz的CPU上,单核单线程,不采用流(SIMD)指令,达到了,处理一千六百万像素的彩色照片仅需700毫秒左右.
按照惯例,还是贴个效果图比较直观.
之前也有网友问过这个算法的实现问题.
想了想,还是将代码共享出来,供大家参考学习.
完整代码:
void CalGaussianCoeff(float sigma, float * a0, float * a1, float * a2, float * a3, float * b1, float * b2, float * cprev, float * cnext) { float alpha, lamma, k; if (sigma < 0.5f) sigma = 0.5f; alpha = (float)exp((0.726) * (0.726)) / sigma; lamma = (float)exp(-alpha); *b2 = (float)exp(-2 * alpha); k = (1 - lamma) * (1 - lamma) / (1 + 2 * alpha * lamma - (*b2)); *a0 = k; *a1 = k * (alpha - 1) * lamma; *a2 = k * (alpha + 1) * lamma; *a3 = -k * (*b2); *b1 = -2 * lamma; *cprev = (*a0 + *a1) / (1 + *b1 + *b2); *cnext = (*a2 + *a3) / (1 + *b1 + *b2); } void gaussianHorizontal(unsigned char * bufferPerLine, unsigned char * pRowInitial, unsigned char * pColumn, int Width, int Height, int Channels, int Nwidth, int a0a1, int a2a3, int b1b2, int cprev, int cnext) { int HeightStep = Channels*Height; int lastWidth = Width - 1; if (Channels == 3) { int prevOut[3]; prevOut[0] = (pRowInitial[0] * cprev) >> 8; prevOut[1] = (pRowInitial[1] * cprev) >> 8; prevOut[2] = (pRowInitial[2] * cprev) >> 8; for (int x = 0; x < Width; ++x) { prevOut[0] = ((pRowInitial[0] * (a0a1)) - (prevOut[0] * (b1b2))) >> 16; prevOut[1] = ((pRowInitial[1] * (a0a1)) - (prevOut[1] * (b1b2))) >> 16; prevOut[2] = ((pRowInitial[2] * (a0a1)) - (prevOut[2] * (b1b2))) >> 16; bufferPerLine[0] = prevOut[0]; bufferPerLine[1] = prevOut[1]; bufferPerLine[2] = prevOut[2]; bufferPerLine += Channels; pRowInitial += Channels; } pRowInitial -= Channels; pColumn += HeightStep * lastWidth; bufferPerLine -= Channels; prevOut[0] = (pRowInitial[0] * cnext) >> 8; prevOut[1] = (pRowInitial[1] * cnext) >> 8; prevOut[2] = (pRowInitial[2] * cnext) >> 8; for (int x = lastWidth; x >= 0; --x) { prevOut[0] = ((pRowInitial[0] * (a2a3)) - (prevOut[0] * (b1b2))) >> 16; prevOut[1] = ((pRowInitial[1] * (a2a3)) - (prevOut[1] * (b1b2))) >> 16; prevOut[2] = ((pRowInitial[2] * (a2a3)) - (prevOut[2] * (b1b2))) >> 16; bufferPerLine[0] += prevOut[0]; bufferPerLine[1] += prevOut[1]; bufferPerLine[2] += prevOut[2]; pColumn[0] = bufferPerLine[0]; pColumn[1] = bufferPerLine[1]; pColumn[2] = bufferPerLine[2]; pRowInitial -= Channels; pColumn -= HeightStep; bufferPerLine -= Channels; } } else if (Channels == 4) { int prevOut[4]; prevOut[0] = (pRowInitial[0] * cprev) >> 8; prevOut[1] = (pRowInitial[1] * cprev) >> 8; prevOut[2] = (pRowInitial[2] * cprev) >> 8; prevOut[3] = (pRowInitial[3] * cprev) >> 8; for (int x = 0; x < Width; ++x) { prevOut[0] = ((pRowInitial[0] * (a0a1)) - (prevOut[0] * (b1b2))) >> 16; prevOut[1] = ((pRowInitial[1] * (a0a1)) - (prevOut[1] * (b1b2))) >> 16; prevOut[2] = ((pRowInitial[2] * (a0a1)) - (prevOut[2] * (b1b2))) >> 16; prevOut[3] = ((pRowInitial[3] * (a0a1)) - (prevOut[3] * (b1b2))) >> 16; bufferPerLine[0] = prevOut[0]; bufferPerLine[1] = prevOut[1]; bufferPerLine[2] = prevOut[2]; bufferPerLine[3] = prevOut[3]; bufferPerLine += Channels; pRowInitial += Channels; } pRowInitial -= Channels; pColumn += HeightStep * lastWidth; bufferPerLine -= Channels; prevOut[0] = (pRowInitial[0] * cnext) >> 8; prevOut[1] = (pRowInitial[1] * cnext) >> 8; prevOut[2] = (pRowInitial[2] * cnext) >> 8; prevOut[3] = (pRowInitial[3] * cnext) >> 8; for (int x = lastWidth; x >= 0; --x) { prevOut[0] = ((pRowInitial[0] * a2a3) - (prevOut[0] * b1b2)) >> 16; prevOut[1] = ((pRowInitial[1] * a2a3) - (prevOut[1] * b1b2)) >> 16; prevOut[2] = ((pRowInitial[2] * a2a3) - (prevOut[2] * b1b2)) >> 16; prevOut[3] = ((pRowInitial[3] * a2a3) - (prevOut[3] * b1b2)) >> 16; bufferPerLine[0] += prevOut[0]; bufferPerLine[1] += prevOut[1]; bufferPerLine[2] += prevOut[2]; bufferPerLine[3] += prevOut[3]; pColumn[0] = bufferPerLine[0]; pColumn[1] = bufferPerLine[1]; pColumn[2] = bufferPerLine[2]; pColumn[3] = bufferPerLine[3]; pRowInitial -= Channels; pColumn -= HeightStep; bufferPerLine -= Channels; } } else if (Channels == 1) { int prevOut = (pRowInitial[0] * cprev) >> 8; for (int x = 0; x < Width; ++x) { prevOut = ((pRowInitial[0] * (a0a1)) - (prevOut * (b1b2))) >> 16; bufferPerLine[0] = prevOut; bufferPerLine += Channels; pRowInitial += Channels; } pRowInitial -= Channels; pColumn += HeightStep*lastWidth; bufferPerLine -= Channels; prevOut = (pRowInitial[0] * cnext) >> 8; for (int x = lastWidth; x >= 0; --x) { prevOut = ((pRowInitial[0] * a2a3) - (prevOut * b1b2)) >> 16;; bufferPerLine[0] += prevOut; pColumn[0] = bufferPerLine[0]; pRowInitial -= Channels; pColumn -= HeightStep; bufferPerLine -= Channels; } } } void gaussianVertical(unsigned char * bufferPerLine, unsigned char * pRowInitial, unsigned char * pColInitial, int Height, int Width, int Channels, int a0a1, int a2a3, int b1b2, int cprev, int cnext) { int WidthStep = Channels*Width; int lastHeight = Height - 1; if (Channels == 3) { int prevOut[3]; prevOut[0] = (pRowInitial[0] * cprev) >> 8; prevOut[1] = (pRowInitial[1] * cprev) >> 8; prevOut[2] = (pRowInitial[2] * cprev) >> 8; for (int y = 0; y < Height; y++) { prevOut[0] = ((pRowInitial[0] * a0a1) - (prevOut[0] * b1b2)) >> 16; prevOut[1] = ((pRowInitial[1] * a0a1) - (prevOut[1] * b1b2)) >> 16; prevOut[2] = ((pRowInitial[2] * a0a1) - (prevOut[2] * b1b2)) >> 16; bufferPerLine[0] = prevOut[0]; bufferPerLine[1] = prevOut[1]; bufferPerLine[2] = prevOut[2]; bufferPerLine += Channels; pRowInitial += Channels; } pRowInitial -= Channels; bufferPerLine -= Channels; pColInitial += WidthStep * lastHeight; prevOut[0] = (pRowInitial[0] * cnext) >> 8; prevOut[1] = (pRowInitial[1] * cnext) >> 8; prevOut[2] = (pRowInitial[2] * cnext) >> 8; for (int y = lastHeight; y >= 0; y--) { prevOut[0] = ((pRowInitial[0] * a2a3) - (prevOut[0] * b1b2)) >> 16; prevOut[1] = ((pRowInitial[1] * a2a3) - (prevOut[1] * b1b2)) >> 16; prevOut[2] = ((pRowInitial[2] * a2a3) - (prevOut[2] * b1b2)) >> 16; bufferPerLine[0] += prevOut[0]; bufferPerLine[1] += prevOut[1]; bufferPerLine[2] += prevOut[2]; pColInitial[0] = bufferPerLine[0]; pColInitial[1] = bufferPerLine[1]; pColInitial[2] = bufferPerLine[2]; pRowInitial -= Channels; pColInitial -= WidthStep; bufferPerLine -= Channels; } } else if (Channels == 4) { int prevOut[4]; prevOut[0] = (pRowInitial[0] * cprev) >> 8; prevOut[1] = (pRowInitial[1] * cprev) >> 8; prevOut[2] = (pRowInitial[2] * cprev) >> 8; prevOut[3] = (pRowInitial[3] * cprev) >> 8; for (int y = 0; y < Height; y++) { prevOut[0] = ((pRowInitial[0] * a0a1) - (prevOut[0] * b1b2)) >> 16; prevOut[1] = ((pRowInitial[1] * a0a1) - (prevOut[1] * b1b2)) >> 16; prevOut[2] = ((pRowInitial[2] * a0a1) - (prevOut[2] * b1b2)) >> 16; prevOut[3] = ((pRowInitial[3] * a0a1) - (prevOut[3] * b1b2)) >> 16; bufferPerLine[0] = prevOut[0]; bufferPerLine[1] = prevOut[1]; bufferPerLine[2] = prevOut[2]; bufferPerLine[3] = prevOut[3]; bufferPerLine += Channels; pRowInitial += Channels; } pRowInitial -= Channels; bufferPerLine -= Channels; pColInitial += WidthStep*lastHeight; prevOut[0] = (pRowInitial[0] * cnext) >> 8; prevOut[1] = (pRowInitial[1] * cnext) >> 8; prevOut[2] = (pRowInitial[2] * cnext) >> 8; prevOut[3] = (pRowInitial[3] * cnext) >> 8; for (int y = lastHeight; y >= 0; y--) { prevOut[0] = ((pRowInitial[0] * a2a3) - (prevOut[0] * b1b2)) >> 16; prevOut[1] = ((pRowInitial[1] * a2a3) - (prevOut[1] * b1b2)) >> 16; prevOut[2] = ((pRowInitial[2] * a2a3) - (prevOut[2] * b1b2)) >> 16; prevOut[3] = ((pRowInitial[3] * a2a3) - (prevOut[3] * b1b2)) >> 16; bufferPerLine[0] += prevOut[0]; bufferPerLine[1] += prevOut[1]; bufferPerLine[2] += prevOut[2]; bufferPerLine[3] += prevOut[3]; pColInitial[0] = bufferPerLine[0]; pColInitial[1] = bufferPerLine[1]; pColInitial[2] = bufferPerLine[2]; pColInitial[3] = bufferPerLine[3]; pRowInitial -= Channels; pColInitial -= WidthStep; bufferPerLine -= Channels; } } else if (Channels == 1) { int prevOut = 0; prevOut = (pRowInitial[0] * cprev) >> 8; for (int y = 0; y < Height; y++) { prevOut = ((pRowInitial[0] * a0a1) - (prevOut * b1b2)) >> 16; bufferPerLine[0] = prevOut; bufferPerLine += Channels; pRowInitial += Channels; } pRowInitial -= Channels; bufferPerLine -= Channels; pColInitial += WidthStep*lastHeight; prevOut = (pRowInitial[0] * cnext) >> 8; for (int y = lastHeight; y >= 0; y--) { prevOut = ((pRowInitial[0] * a2a3) - (prevOut * b1b2)) >> 16; bufferPerLine[0] += prevOut; pColInitial[0] = bufferPerLine[0]; pRowInitial -= Channels; pColInitial -= WidthStep; bufferPerLine -= Channels; } } } //本人博客:http://tntmonks.cnblogs.com/ 转载请注明出处. void GaussianBlurFilter(unsigned char * inputBuffer, unsigned char * outputBuffer, int Width, int Height, int Channels, float gaussianSigma = 2.0f) { float a0, a1, a2, a3, b1, b2, cprev, cnext; CalGaussianCoeff(gaussianSigma, &a0, &a1, &a2, &a3, &b1, &b2, &cprev, &cnext); int icprev = cprev * 256; int icnext = cnext * 256; int a0a1 = (a0 + a1) * 65536; int a2a3 = (a2 + a3) * 65536; int b1b2 = (b1 + b2) * 65536; int bufferSizePerLine = (Width > Height ? Width : Height) * Channels; unsigned char * bufferPerLine = (unsigned char*)malloc(bufferSizePerLine); unsigned char * cacheData = (unsigned char*)malloc(Height * Width * Channels); int WidthStep = Width * Channels; for (int y = 0; y < Height; ++y) { unsigned char * pRowInitial = inputBuffer + WidthStep * y; unsigned char * pColumnInitial = cacheData + y * Channels; gaussianHorizontal(bufferPerLine, pRowInitial, pColumnInitial, Width, Height, Channels, Width, a0a1, a2a3, b1b2, icprev, icnext); } int HeightStep = Height*Channels; for (int x = 0; x < Width; ++x) { unsigned char * pColInitial = outputBuffer + x*Channels; unsigned char * pRowInitial = cacheData + HeightStep * x; gaussianVertical(bufferPerLine, pRowInitial, pColInitial, Height, Width, Channels, a0a1, a2a3, b1b2, icprev, icnext); } free(bufferPerLine); free(cacheData); }
调用方法:
GaussianBlurFilter(输入图像数据,输出图像数据,宽度,高度,通道数,强度)
注:支持通道数分别为 1 ,3 ,4.
关于IIR相关知识,参阅 百度词条 "IIR数字滤波器"
http://baike.baidu.com/view/3088994.htm
天下武功,唯快不破。
本文只是抛砖引玉一下,若有其他相关问题或者需求也可以邮件联系俺探讨。
邮箱地址是:
gaozhihan@vip.qq.com
题外话:
很多网友一直推崇使用opencv,opencv的确十分强大,但是若是想要有更大的发展空间以及创造力.
还是要一步一个脚印去实现一些最基本的算法,扎实的基础才是构建上层建筑的基本条件.
俺目前只是把opencv当资料库来看,并不认为opencv可以用于绝大多数的商业项目.
若本文帮到您,厚颜无耻求微信扫码打个赏.
半径无关单核单线程最快速高斯模糊实现(附完整C代码)的更多相关文章
- 半径无关快速高斯模糊实现(附完整C代码)
之前,俺也发过不少快速高斯模糊算法. 俺一般认为,只要处理一千六百万像素彩色图片,在2.2GHz的CPU上单核单线程超过1秒的算法,都是不快的. 之前发的几个算法,在俺2.2GHz的CPU上耗时都会超 ...
- 快速双边滤波 附完整C代码
很早之前写过<双边滤波算法的简易实现bilateralFilter>. 当时学习参考的代码来自cuda的样例. 相关代码可以参阅: https://github.com/johng12/c ...
- 传统高斯模糊与优化算法(附完整C++代码)
高斯模糊(英语:Gaussian Blur),也叫高斯平滑,是在Adobe Photoshop.GIMP以及Paint.NET等图像处理软件中广泛使用的处理效果,通常用它来减少图像噪声以及降低细节层次 ...
- 【如何快速的开发一个完整的iOS直播app】(美颜篇)
原文转自:袁峥Seemygo 感谢分享.自我学习 前言 在看这篇之前,如果您还不了解直播原理,请查看这篇文章如何快速的开发一个完整的iOS直播app(原理篇) 开发一款直播app,美颜功能是很重 ...
- 【如何快速的开发一个完整的 iOS 直播 app】(美颜篇)
来源:袁峥Seemygo 链接:http://www.jianshu.com/p/4646894245ba 前言 在看这篇之前,如果您还不了解直播原理,请查看这篇文章如何快速的开发一个完整的iOS直播 ...
- 如何快速的开发一个完整的iOS直播app(美颜篇)
前言 在看这篇之前,如果您还不了解直播原理,请查看这篇文章如何快速的开发一个完整的iOS直播app(原理篇) 开发一款直播app,美颜功能是很重要的,如果没有美颜功能,可能分分钟钟掉粉千万,本篇主要讲 ...
- 【如何快速的开发一个完整的iOS直播app】(采集篇)
原文转自:袁峥Seemygo 感谢分享.自我学习 前言 在看这篇之前,如果您还不了解直播原理,请查看这篇文章如何快速的开发一个完整的iOS直播app(原理篇) 开发一款直播app,首先需要采集主 ...
- 【如何快速的开发一个完整的iOS直播app】(播放篇)
原文转自:袁峥Seemygo 感谢分享.自我学习 前言 在看这篇之前,如果您还不了解直播原理,请查看上篇文章如何快速的开发一个完整的iOS直播app(原理篇) 开发一款直播app,集成ijkpl ...
- 【如何快速的开发一个完整的iOS直播app】(原理篇)
原文转自:袁峥Seemygo 感谢分享.自我学习 目录 [如何快速的开发一个完整的iOS直播app](原理篇) [如何快速的开发一个完整的iOS直播app](播放篇) [如何快速的开发一个完整的 ...
随机推荐
- treap树---Double Queue
HDU 1908 Description The new founded Balkan Investment Group Bank (BIG-Bank) opened a new office i ...
- ex_KMP--Theme Section
题目网址: http://acm.hust.edu.cn/vjudge/contest/view.action?cid=110060#problem/B Description It's time f ...
- 转载 教你使用PS来制作unity3D随机地形
- 单机安装HBase
1.首先从官网上下载HBase安装包 http://mirrors.hust.edu.cn/apache/hbase/1.2.2/hbase-1.2.2-bin.tar.gz 2.解压缩到安装目录 / ...
- Design Patterns (简单工厂模式)
文章很长很精彩,如是初学请耐心观看.(大神请绕道!) 简单工厂模式: 1.创建型模式 2.简单工厂模式概述 3.简单工厂模式的结构与实现 4.简单工厂模式的应用实例 5.创建对象与使用对象 6.简单工 ...
- JavaScript 之垃圾回收和内存管理
JavaScript 具有自动垃圾收集机制(GC:Garbage Collecation),也就是说,执行环境会负责管理代码执行过程中使用的内存.而在 C 和 C++ 之类的语言中,开发人员的一项基本 ...
- art-template引擎模板
art-template简介 artTemplate(后文简称aT)才是模板引擎,而TmodJS(后文简称TJ,曾用名atc)则是依赖于前者的一款模板预编译器.两者都是由腾讯开发.其实aT完全可以独立 ...
- WCF概念
WCF 概念 WCF是.NET Framework 上灵活通讯技术.在.NET 3.0推出之前,一个企业解决方案需要几种通讯方式.对于独立于平台的通讯,使用ASP.NET Web服务.对于比较高级的 ...
- SharePoint 错误集 3
1. workflow 流程走不下去,报 workflow fails to run 的错误 请确保下面二个service要么都start,要么都stop: Microsoft SharePoint ...
- 2015年第5本(英文第4本):Death on the Nile尼罗河上的惨案
书名:Death on the Nile 作者: Agatha Christie 单词数:7.9万(读完后发现网上还有一个版本,总共2.7万单词,孩子都能读懂,看来是简写版) 词汇量:6700 首万词 ...