OpenCL双边滤波实现美颜功能
OpenCL是一个并行异构计算的框架,包括intel,AMD,英伟达等等许多厂家都有对它的支持,不过英伟达只到1.2版本,主要发展自己的CUDA去了。虽然没有用过CUDA,但个人感觉CUDA比OpenCL更好一点,但OpenCL支持面更管,CPU,GPU,DSP,FPGA等多种芯片都能支持OpenCL。OpenCL与D3D中的像素着色器非常相似。
1.双边滤波原理
双边滤波器的原理参考女神Rachel-Zhang的博客 双边滤波器的原理及实现. 引自Rachel-Zhang的博客,原理如下:
双边滤波(Bilateral filter)是一种可以保边去噪的滤波器。之所以可以达到此去噪效果,是因为滤波器是由两个函数构成。一个函数是由几何空间距离决定滤波器系数。另一个由像素差值决定滤波器系数。可以与其相比较的两个filter:高斯低通滤波器(http://en.wikipedia.org/wiki/Gaussian_filter)和α-截尾均值滤波器(去掉百分率为α的最小值和最大之后剩下像素的均值作为滤波器)。
双边滤波器中,输出像素的值依赖于邻域像素的值的加权组合,
,
权重系数w(i,j,k,l)取决于定义域核和值域核的乘积。同时考虑了空间域与值域的差别,而Gaussian Filter和α均值滤波分别只考虑了空间域和值域差别。
本文基于这个公式用OpenCL实现双边滤波来做美颜。
2.核函数
磨皮算法原理参考自http://www.zealfilter.com/portal.php?mod=view&aid=138,其中的肤色检测算法不好,我给去掉了,本来还要做个锐化处理的,但发现不做锐化效果也蛮好,所以就先没做,学下一步的OpenCL时在做锐化。
const sampler_t sampler = CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST; kernel void bilateralBlur(read_only image2d_t src,write_only image2d_t dst)
{
int x = (int)get_global_id();
int y = (int)get_global_id();
if (x >= get_image_width(src) || y >= get_image_height(src))
return; int ksize = ;
float sigma_d = 3.0;
float sigma_r = 0.1; float4 fij = read_imagef(src, sampler, (int2)(x, y));
float alpha = 0.2; float4 fkl;
float dkl;
float4 rkl;
float4 wkl; float4 numerator = (float4)(0.0f,0.0f,0.0f,0.0f);
float4 denominator = (float4)(1.0f, 1.0f, 1.0f, 1.0f);
for (int K = -ksize / ; K <= ksize / ; K++)
{
for (int L = -ksize / ; L <= ksize / ; L++)
{
fkl = read_imagef(src, sampler, (int2)(x + K, y + L)); dkl = -(K*K + L*L) / ( * sigma_d*sigma_d);
rkl.x = -(fij.x - fkl.x)*(fij.x - fkl.x) / ( * sigma_r*sigma_r);
rkl.y = -(fij.y - fkl.y)*(fij.y - fkl.y) / ( * sigma_r*sigma_r);
rkl.z = -(fij.z - fkl.z)*(fij.z - fkl.z) / ( * sigma_r*sigma_r); wkl.x = exp(dkl + rkl.x);
wkl.y = exp(dkl + rkl.y);
wkl.z = exp(dkl + rkl.z); numerator.x += fkl.x * wkl.x;
numerator.y += fkl.y * wkl.y;
numerator.z += fkl.z * wkl.z; denominator.x += wkl.x;
denominator.y += wkl.y;
denominator.z += wkl.z;
}
} float4 gij = (float4)(0.0f, 0.0f, 0.0f, 1.0f);
if (denominator.x > && denominator.y > && denominator.z)
{
gij.x = numerator.x / denominator.x;
gij.y = numerator.y / denominator.y;
gij.z = numerator.z / denominator.z; //双边滤波后再做一个融合
gij.x = fij.x*alpha + gij.x*(1.0 - alpha);
gij.y = fij.y*alpha + gij.y*(1.0 - alpha);
gij.z = fij.z*alpha + gij.z*(1.0 - alpha);
} write_imagef(dst, (int2)(x, y), gij);
}
kernel函数里面基本就是把数学公式写出来,可以说是非常简单的。
3.host端代码
OpenCL代码分为host端的代码和device端的代码,kernel是跑在并行设备device上的,host一般适合跑串行的逻辑性强的代码,device则比较适合用来做计算,如卷积运算。计算机中,通常把CPU当host,把GPU当device。不过实际上CPU也可以作为device,因为intel也是支持OpenCL的。本文以CPU为host,GPU为device。
#include "stdafx.h" #include <iostream>
#include <fstream>
#include <sstream>
#include <malloc.h>
#include <string.h>
#include <opencv2/opencv.hpp> #include <CL/cl.h> //----------获取OpenCL平台设备信息--------- void DisplayPlatformInfo(
cl_platform_id id,
cl_platform_info name,
std::string str)
{
cl_int errNum;
std::size_t paramValueSize; errNum = clGetPlatformInfo(
id,
name,
,
NULL,
¶mValueSize);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl;
return;
} char * info = (char *)alloca(sizeof(char) * paramValueSize);
errNum = clGetPlatformInfo(
id,
name,
paramValueSize,
info,
NULL);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl;
return;
} std::cout << "\t" << str << ":\t" << info << std::endl;
} template<typename T>
void appendBitfield(T info, T value, std::string name, std::string & str)
{
if (info & value)
{
if (str.length() > )
{
str.append(" | ");
}
str.append(name);
}
} ///
// Display information for a particular device.
// As different calls to clGetDeviceInfo may return
// values of different types a template is used.
// As some values returned are arrays of values, a templated class is
// used so it can be specialized for this case, see below.
//
template <typename T>
class InfoDevice
{
public:
static void display(
cl_device_id id,
cl_device_info name,
std::string str)
{
cl_int errNum;
std::size_t paramValueSize; errNum = clGetDeviceInfo(
id,
name,
,
NULL,
¶mValueSize);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl;
return;
} T * info = (T *)alloca(sizeof(T) * paramValueSize);
errNum = clGetDeviceInfo(
id,
name,
paramValueSize,
info,
NULL);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl;
return;
} // Handle a few special cases
switch (name)
{
case CL_DEVICE_TYPE:
{
std::string deviceType; appendBitfield<cl_device_type>(
*(reinterpret_cast<cl_device_type*>(info)),
CL_DEVICE_TYPE_CPU,
"CL_DEVICE_TYPE_CPU",
deviceType); appendBitfield<cl_device_type>(
*(reinterpret_cast<cl_device_type*>(info)),
CL_DEVICE_TYPE_GPU,
"CL_DEVICE_TYPE_GPU",
deviceType); appendBitfield<cl_device_type>(
*(reinterpret_cast<cl_device_type*>(info)),
CL_DEVICE_TYPE_ACCELERATOR,
"CL_DEVICE_TYPE_ACCELERATOR",
deviceType); appendBitfield<cl_device_type>(
*(reinterpret_cast<cl_device_type*>(info)),
CL_DEVICE_TYPE_DEFAULT,
"CL_DEVICE_TYPE_DEFAULT",
deviceType); std::cout << "\t\t" << str << ":\t" << deviceType << std::endl;
}
break;
case CL_DEVICE_SINGLE_FP_CONFIG:
{
std::string fpType; appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_DENORM,
"CL_FP_DENORM",
fpType); appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_INF_NAN,
"CL_FP_INF_NAN",
fpType); appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_ROUND_TO_NEAREST,
"CL_FP_ROUND_TO_NEAREST",
fpType); appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_ROUND_TO_ZERO,
"CL_FP_ROUND_TO_ZERO",
fpType); appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_ROUND_TO_INF,
"CL_FP_ROUND_TO_INF",
fpType); appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_FMA,
"CL_FP_FMA",
fpType); #ifdef CL_FP_SOFT_FLOAT
appendBitfield<cl_device_fp_config>(
*(reinterpret_cast<cl_device_fp_config*>(info)),
CL_FP_SOFT_FLOAT,
"CL_FP_SOFT_FLOAT",
fpType);
#endif std::cout << "\t\t" << str << ":\t" << fpType << std::endl;
}
case CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:
{
std::string memType; appendBitfield<cl_device_mem_cache_type>(
*(reinterpret_cast<cl_device_mem_cache_type*>(info)),
CL_NONE,
"CL_NONE",
memType);
appendBitfield<cl_device_mem_cache_type>(
*(reinterpret_cast<cl_device_mem_cache_type*>(info)),
CL_READ_ONLY_CACHE,
"CL_READ_ONLY_CACHE",
memType); appendBitfield<cl_device_mem_cache_type>(
*(reinterpret_cast<cl_device_mem_cache_type*>(info)),
CL_READ_WRITE_CACHE,
"CL_READ_WRITE_CACHE",
memType); std::cout << "\t\t" << str << ":\t" << memType << std::endl;
}
break;
case CL_DEVICE_LOCAL_MEM_TYPE:
{
std::string memType; appendBitfield<cl_device_local_mem_type>(
*(reinterpret_cast<cl_device_local_mem_type*>(info)),
CL_GLOBAL,
"CL_LOCAL",
memType); appendBitfield<cl_device_local_mem_type>(
*(reinterpret_cast<cl_device_local_mem_type*>(info)),
CL_GLOBAL,
"CL_GLOBAL",
memType); std::cout << "\t\t" << str << ":\t" << memType << std::endl;
}
break;
case CL_DEVICE_EXECUTION_CAPABILITIES:
{
std::string memType; appendBitfield<cl_device_exec_capabilities>(
*(reinterpret_cast<cl_device_exec_capabilities*>(info)),
CL_EXEC_KERNEL,
"CL_EXEC_KERNEL",
memType); appendBitfield<cl_device_exec_capabilities>(
*(reinterpret_cast<cl_device_exec_capabilities*>(info)),
CL_EXEC_NATIVE_KERNEL,
"CL_EXEC_NATIVE_KERNEL",
memType); std::cout << "\t\t" << str << ":\t" << memType << std::endl;
}
break;
case CL_DEVICE_QUEUE_PROPERTIES:
{
std::string memType; appendBitfield<cl_device_exec_capabilities>(
*(reinterpret_cast<cl_device_exec_capabilities*>(info)),
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE,
"CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE",
memType); appendBitfield<cl_device_exec_capabilities>(
*(reinterpret_cast<cl_device_exec_capabilities*>(info)),
CL_QUEUE_PROFILING_ENABLE,
"CL_QUEUE_PROFILING_ENABLE",
memType); std::cout << "\t\t" << str << ":\t" << memType << std::endl;
}
break;
default:
std::cout << "\t\t" << str << ":\t" << *info << std::endl;
break;
}
}
}; ///
// Simple trait class used to wrap base types.
//
template <typename T>
class ArrayType
{
public:
static bool isChar() { return false; }
}; ///
// Specialized for the char (i.e. null terminated string case).
//
template<>
class ArrayType<char>
{
public:
static bool isChar() { return true; }
}; ///
// Specialized instance of class InfoDevice for array types.
//
template <typename T>
class InfoDevice<ArrayType<T> >
{
public:
static void display(
cl_device_id id,
cl_device_info name,
std::string str)
{
cl_int errNum;
std::size_t paramValueSize; errNum = clGetDeviceInfo(
id,
name,
,
NULL,
¶mValueSize);
if (errNum != CL_SUCCESS)
{
std::cerr
<< "Failed to find OpenCL device info "
<< str
<< "."
<< std::endl;
return;
} T * info = (T *)alloca(sizeof(T) * paramValueSize);
errNum = clGetDeviceInfo(
id,
name,
paramValueSize,
info,
NULL);
if (errNum != CL_SUCCESS)
{
std::cerr
<< "Failed to find OpenCL device info "
<< str
<< "."
<< std::endl;
return;
} if (ArrayType<T>::isChar())
{
std::cout << "\t" << str << ":\t" << info << std::endl;
}
else if (name == CL_DEVICE_MAX_WORK_ITEM_SIZES)
{
cl_uint maxWorkItemDimensions; errNum = clGetDeviceInfo(
id,
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,
sizeof(cl_uint),
&maxWorkItemDimensions,
NULL);
if (errNum != CL_SUCCESS)
{
std::cerr
<< "Failed to find OpenCL device info "
<< "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS."
<< std::endl;
return;
} std::cout << "\t" << str << ":\t";
for (cl_uint i = ; i < maxWorkItemDimensions; i++)
{
std::cout << info[i] << " ";
}
std::cout << std::endl;
}
}
}; ///
// Enumerate platforms and display information about them
// and their associated devices.
//
void displayInfo(void)
{
cl_int errNum;
cl_uint numPlatforms;
cl_platform_id * platformIds;
cl_context context = NULL; // First, query the total number of platforms
errNum = clGetPlatformIDs(, NULL, &numPlatforms);
if (errNum != CL_SUCCESS || numPlatforms <= )
{
std::cerr << "Failed to find any OpenCL platform." << std::endl;
return;
} // Next, allocate memory for the installed plaforms, and qeury
// to get the list.
platformIds = (cl_platform_id *)alloca(sizeof(cl_platform_id) * numPlatforms);
// First, query the total number of platforms
errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find any OpenCL platforms." << std::endl;
return;
} std::cout << "Number of platforms: \t" << numPlatforms << std::endl;
// Iterate through the list of platforms displaying associated information
for (cl_uint i = ; i < numPlatforms; i++) {
// First we display information associated with the platform
DisplayPlatformInfo(
platformIds[i],
CL_PLATFORM_PROFILE,
"CL_PLATFORM_PROFILE");
DisplayPlatformInfo(
platformIds[i],
CL_PLATFORM_VERSION,
"CL_PLATFORM_VERSION");
DisplayPlatformInfo(
platformIds[i],
CL_PLATFORM_VENDOR,
"CL_PLATFORM_VENDOR");
DisplayPlatformInfo(
platformIds[i],
CL_PLATFORM_EXTENSIONS,
"CL_PLATFORM_EXTENSIONS"); // Now query the set of devices associated with the platform
cl_uint numDevices;
errNum = clGetDeviceIDs(
platformIds[i],
CL_DEVICE_TYPE_ALL,
,
NULL,
&numDevices);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find OpenCL devices." << std::endl;
return;
} cl_device_id * devices = (cl_device_id *)alloca(sizeof(cl_device_id) * numDevices);
errNum = clGetDeviceIDs(
platformIds[i],
CL_DEVICE_TYPE_ALL,
numDevices,
devices,
NULL);
if (errNum != CL_SUCCESS)
{
std::cerr << "Failed to find OpenCL devices." << std::endl;
return;
} std::cout << "\tNumber of devices: \t" << numDevices << std::endl;
// Iterate through each device, displaying associated information
for (cl_uint j = ; j < numDevices; j++)
{
InfoDevice<cl_device_type>::display(
devices[j],
CL_DEVICE_TYPE,
"CL_DEVICE_TYPE"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_VENDOR_ID,
"CL_DEVICE_VENDOR_ID"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_COMPUTE_UNITS,
"CL_DEVICE_MAX_COMPUTE_UNITS"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,
"CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS"); InfoDevice<ArrayType<size_t> >::display(
devices[j],
CL_DEVICE_MAX_WORK_ITEM_SIZES,
"CL_DEVICE_MAX_WORK_ITEM_SIZES"); InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_MAX_WORK_GROUP_SIZE,
"CL_DEVICE_MAX_WORK_GROUP_SIZE"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE"); #ifdef CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF,
"CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_INT"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF,
"CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF");
#endif InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_CLOCK_FREQUENCY,
"CL_DEVICE_MAX_CLOCK_FREQUENCY"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_ADDRESS_BITS,
"CL_DEVICE_ADDRESS_BITS"); InfoDevice<cl_ulong>::display(
devices[j],
CL_DEVICE_MAX_MEM_ALLOC_SIZE,
"CL_DEVICE_MAX_MEM_ALLOC_SIZE"); InfoDevice<cl_bool>::display(
devices[j],
CL_DEVICE_IMAGE_SUPPORT,
"CL_DEVICE_IMAGE_SUPPORT"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_READ_IMAGE_ARGS,
"CL_DEVICE_MAX_READ_IMAGE_ARGS"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_WRITE_IMAGE_ARGS,
"CL_DEVICE_MAX_WRITE_IMAGE_ARGS"); InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_IMAGE2D_MAX_WIDTH,
"CL_DEVICE_IMAGE2D_MAX_WIDTH"); InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_IMAGE2D_MAX_WIDTH,
"CL_DEVICE_IMAGE2D_MAX_WIDTH"); InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_IMAGE2D_MAX_HEIGHT,
"CL_DEVICE_IMAGE2D_MAX_HEIGHT"); InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_IMAGE3D_MAX_WIDTH,
"CL_DEVICE_IMAGE3D_MAX_WIDTH"); InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_IMAGE3D_MAX_HEIGHT,
"CL_DEVICE_IMAGE3D_MAX_HEIGHT"); InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_IMAGE3D_MAX_DEPTH,
"CL_DEVICE_IMAGE3D_MAX_DEPTH"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_SAMPLERS,
"CL_DEVICE_MAX_SAMPLERS"); InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_MAX_PARAMETER_SIZE,
"CL_DEVICE_MAX_PARAMETER_SIZE"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MEM_BASE_ADDR_ALIGN,
"CL_DEVICE_MEM_BASE_ADDR_ALIGN"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE,
"CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE"); InfoDevice<cl_device_fp_config>::display(
devices[j],
CL_DEVICE_SINGLE_FP_CONFIG,
"CL_DEVICE_SINGLE_FP_CONFIG"); InfoDevice<cl_device_mem_cache_type>::display(
devices[j],
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE,
"CL_DEVICE_GLOBAL_MEM_CACHE_TYPE"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE,
"CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE"); InfoDevice<cl_ulong>::display(
devices[j],
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE,
"CL_DEVICE_GLOBAL_MEM_CACHE_SIZE"); InfoDevice<cl_ulong>::display(
devices[j],
CL_DEVICE_GLOBAL_MEM_SIZE,
"CL_DEVICE_GLOBAL_MEM_SIZE"); InfoDevice<cl_ulong>::display(
devices[j],
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE,
"CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE"); InfoDevice<cl_uint>::display(
devices[j],
CL_DEVICE_MAX_CONSTANT_ARGS,
"CL_DEVICE_MAX_CONSTANT_ARGS"); InfoDevice<cl_device_local_mem_type>::display(
devices[j],
CL_DEVICE_LOCAL_MEM_TYPE,
"CL_DEVICE_LOCAL_MEM_TYPE"); InfoDevice<cl_ulong>::display(
devices[j],
CL_DEVICE_LOCAL_MEM_SIZE,
"CL_DEVICE_LOCAL_MEM_SIZE"); InfoDevice<cl_bool>::display(
devices[j],
CL_DEVICE_ERROR_CORRECTION_SUPPORT,
"CL_DEVICE_ERROR_CORRECTION_SUPPORT"); #ifdef CL_DEVICE_HOST_UNIFIED_MEMORY
InfoDevice<cl_bool>::display(
devices[j],
CL_DEVICE_HOST_UNIFIED_MEMORY,
"CL_DEVICE_HOST_UNIFIED_MEMORY");
#endif InfoDevice<std::size_t>::display(
devices[j],
CL_DEVICE_PROFILING_TIMER_RESOLUTION,
"CL_DEVICE_PROFILING_TIMER_RESOLUTION"); InfoDevice<cl_bool>::display(
devices[j],
CL_DEVICE_ENDIAN_LITTLE,
"CL_DEVICE_ENDIAN_LITTLE"); InfoDevice<cl_bool>::display(
devices[j],
CL_DEVICE_AVAILABLE,
"CL_DEVICE_AVAILABLE"); InfoDevice<cl_bool>::display(
devices[j],
CL_DEVICE_COMPILER_AVAILABLE,
"CL_DEVICE_COMPILER_AVAILABLE"); InfoDevice<cl_device_exec_capabilities>::display(
devices[j],
CL_DEVICE_EXECUTION_CAPABILITIES,
"CL_DEVICE_EXECUTION_CAPABILITIES"); InfoDevice<cl_command_queue_properties>::display(
devices[j],
CL_DEVICE_QUEUE_PROPERTIES,
"CL_DEVICE_QUEUE_PROPERTIES"); InfoDevice<cl_platform_id>::display(
devices[j],
CL_DEVICE_PLATFORM,
"CL_DEVICE_PLATFORM"); InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DEVICE_NAME,
"CL_DEVICE_NAME"); InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DEVICE_VENDOR,
"CL_DEVICE_VENDOR"); InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DRIVER_VERSION,
"CL_DRIVER_VERSION"); InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DEVICE_PROFILE,
"CL_DEVICE_PROFILE"); InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DEVICE_VERSION,
"CL_DEVICE_VERSION"); #ifdef CL_DEVICE_OPENCL_C_VERSION
InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DEVICE_OPENCL_C_VERSION,
"CL_DEVICE_OPENCL_C_VERSION");
#endif InfoDevice<ArrayType<char> >::display(
devices[j],
CL_DEVICE_EXTENSIONS,
"CL_DEVICE_EXTENSIONS"); std::cout << std::endl << std::endl;
}
}
} //-----------以上为获取并显示OpenCL设备信息的代码------------------ cl_program CreateProgram(cl_context context, cl_device_id device, const char* fileName)
{
cl_int errNum;
cl_program program; std::ifstream kernelFile(fileName, std::ios::in);
if (!kernelFile.is_open())
{
std::cerr << "Failed to open file for reading: " << fileName << std::endl;
return NULL;
} std::ostringstream oss;
oss << kernelFile.rdbuf(); std::string srcStdStr = oss.str();
const char *srcStr = srcStdStr.c_str();
program = clCreateProgramWithSource(context, ,
(const char**)&srcStr,
NULL, NULL);
if (program == NULL)
{
std::cerr << "Failed to create CL program from source." << std::endl;
return NULL;
} errNum = clBuildProgram(program, , NULL, NULL, NULL, NULL);
if (errNum != CL_SUCCESS)
{
// Determine the reason for the error
char buildLog[];
clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,
sizeof(buildLog), buildLog, NULL); std::cerr << "Error in kernel: " << std::endl;
std::cerr << buildLog;
clReleaseProgram(program);
return NULL;
} return program;
} void Cleanup(cl_context context, cl_command_queue commandQueue,
cl_program program, cl_kernel kernel, cl_mem imageObjects[])
{
for (int i = ; i < ; i++)
{
if (imageObjects[i] != )
clReleaseMemObject(imageObjects[i]);
}
if (commandQueue != )
clReleaseCommandQueue(commandQueue); if (kernel != )
clReleaseKernel(kernel); if (program != )
clReleaseProgram(program); if (context != )
clReleaseContext(context); } cl_mem LoadImage(cl_context context, char *fileName, int &width, int &height)
{
cv::Mat image1 = cv::imread(fileName);
width = image1.cols;
height = image1.rows;
char *buffer = new char[width * height * ];
int w = ;
for (int v = height - ; v >= ; v--)
{
for (int u = ; u <width; u++)
{
buffer[w++] = image1.at<cv::Vec3b>(v, u)[];
buffer[w++] = image1.at<cv::Vec3b>(v, u)[];
buffer[w++] = image1.at<cv::Vec3b>(v, u)[];
w++;
}
} // Create OpenCL image
cl_image_format clImageFormat;
clImageFormat.image_channel_order = CL_RGBA;
clImageFormat.image_channel_data_type = CL_UNORM_INT8; cl_int errNum;
cl_mem clImage;
clImage = clCreateImage2D(context,
CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
&clImageFormat,
width,
height,
,
buffer,
&errNum); if (errNum != CL_SUCCESS)
{
std::cerr << "Error creating CL image object" << std::endl;
return ;
} return clImage;
} size_t RoundUp(int groupSize, int globalSize)
{
int r = globalSize % groupSize;
if (r == )
{
return globalSize;
}
else
{
return globalSize + groupSize - r;
}
} int main(int argc, char** argv)
{
cl_context context = ;
cl_command_queue commandQueue = ;
cl_program program = ;
cl_device_id device = ;
cl_kernel kernel = ;
cl_mem imageObjects[] = { , };
cl_int errNum; //打印所有OpenCL平台设备信息
displayInfo(); cl_uint numplatforms;
errNum = clGetPlatformIDs(, NULL, &numplatforms);
if (errNum != CL_SUCCESS || numplatforms <= ){
printf("没有找到OpenCL平台 \n");
return ;
} cl_platform_id * platformIds;
platformIds = (cl_platform_id*)alloca(sizeof(cl_platform_id)*numplatforms);
errNum = clGetPlatformIDs(numplatforms, platformIds, NULL);
if (errNum != CL_SUCCESS){
printf("没有找到OpenCL平台 \n");
return ;
}
printf("平台数:%d \n", numplatforms); //选用CL_DEVICE_MAX_WORK_GROUP_SIZE最大的显卡
cl_uint numDevices,index_platform = ,index_device = ;
cl_device_id *devicesIds;
std::size_t paramValueSize = ;
for (cl_uint i = ; i < numplatforms; i++){
errNum = clGetDeviceIDs(platformIds[i], CL_DEVICE_TYPE_GPU, , NULL, &numDevices);
if (errNum != CL_SUCCESS || numDevices <= ){
printf("平台 %d 没有找到设备",i);
continue;
}
devicesIds = (cl_device_id*)alloca(sizeof(cl_device_id)*numDevices);
errNum = clGetDeviceIDs(platformIds[i], CL_DEVICE_TYPE_GPU, numDevices, devicesIds, NULL);
if (errNum != CL_SUCCESS ){
printf("平台 %d 获取设备ID失败", i);
continue;
} for (cl_uint j = ; j < numDevices; j++){
std::size_t tmpSize = ;
errNum = clGetDeviceInfo(devicesIds[j], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &tmpSize, NULL);
if (errNum != CL_SUCCESS){
std::cerr << "Failed to find OpenCL device info " << std::endl;
continue;
} if (tmpSize >= paramValueSize){
index_platform = i;
index_device = j;
}
}
} cl_context_properties contextProperties[] ={
CL_CONTEXT_PLATFORM,
(cl_context_properties)platformIds[index_platform], };
context = clCreateContext(contextProperties, numDevices, devicesIds, NULL, NULL, &errNum);
if (errNum != CL_SUCCESS){
std::cerr << "Failed to Create Context " << std::endl;
return ;
} device = devicesIds[index_device]; // Create a command-queue on the first device available
// on the created context
commandQueue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &errNum);
if (commandQueue == NULL) {
Cleanup(context, commandQueue, program, kernel, imageObjects);
system("pause") ; return ;
} // Make sure the device supports images, otherwise exit
cl_bool imageSupport = CL_FALSE;
clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(cl_bool), &imageSupport, NULL);
if (imageSupport != CL_TRUE) {
std::cerr << "OpenCL device does not support images." << std::endl;
Cleanup(context, commandQueue, program, kernel, imageObjects);
system("pause") ; return ;
} // Load input image from file and load it into
// an OpenCL image object
int width, height;
char *src0 = "test.png";
imageObjects[] = LoadImage(context, src0, width, height);
if (imageObjects[] == ) {
std::cerr << "Error loading: " << std::string(src0) << std::endl;
Cleanup(context, commandQueue, program, kernel, imageObjects);
system("pause") ; return ;
} // Create ouput image object
cl_image_format clImageFormat;
clImageFormat.image_channel_order = CL_RGBA;
clImageFormat.image_channel_data_type = CL_UNORM_INT8;
imageObjects[] = clCreateImage2D(context,
CL_MEM_WRITE_ONLY,
&clImageFormat,
width,
height,
,
NULL,
&errNum); if (errNum != CL_SUCCESS){
std::cerr << "Error creating CL output image object." << std::endl;
Cleanup(context, commandQueue, program, kernel, imageObjects);
system("pause") ; return ;
} // Create OpenCL program
program = CreateProgram(context, device, "bilateralBlur.cl");
if (program == NULL) {
Cleanup(context, commandQueue, program, kernel, imageObjects);
system("pause") ; return ;
}
// Create OpenCL kernel
kernel = clCreateKernel(program, "bilateralBlur", NULL);
if (kernel == NULL) {
std::cerr << "Failed to create kernel" << std::endl;
Cleanup(context, commandQueue, program, kernel, imageObjects);
system("pause") ; return ;
} // Set the kernel arguments
errNum = clSetKernelArg(kernel, , sizeof(cl_mem), &imageObjects[]);
errNum |= clSetKernelArg(kernel, , sizeof(cl_mem), &imageObjects[]);
if (errNum != CL_SUCCESS) {
std::cerr << "Error setting kernel arguments." << std::endl;
Cleanup(context, commandQueue, program, kernel, imageObjects);
system("pause") ; return ;
} size_t localWorkSize[] = { , };
size_t globalWorkSize[] = { RoundUp(localWorkSize[], width),
RoundUp(localWorkSize[], height) }; cl_event prof_event; // Queue the kernel up for execution
errNum = clEnqueueNDRangeKernel(commandQueue, kernel, , NULL,
globalWorkSize, localWorkSize,
, NULL, &prof_event);
if (errNum != CL_SUCCESS)
{
std::cerr << "Error queuing kernel for execution." << std::endl;
Cleanup(context, commandQueue, program, kernel, imageObjects);
system("pause") ; return ;
} clFinish(commandQueue);
errNum = clWaitForEvents(, &prof_event);
if (errNum)
{
printf("clWaitForEvents() failed for histogram_rgba_unorm8 kernel. (%d)\n", errNum);
return EXIT_FAILURE;
} cl_ulong ev_start_time = (cl_ulong);
cl_ulong ev_end_time = (cl_ulong);
size_t return_bytes; errNum = clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong), &ev_start_time, &return_bytes);
errNum |= clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_END,sizeof(cl_ulong), &ev_end_time, &return_bytes);
if (errNum)
{
printf("clGetEventProfilingInfo() failed for kernel. (%d)\n", errNum);
return EXIT_FAILURE;
} double run_time = (double)(ev_end_time - ev_start_time); printf("Image dimensions: %d x %d pixels, Image type = CL_RGBA, CL_UNORM_INT8\n", width, height);
printf("Work Timer:%lfms\n", run_time / ); clReleaseEvent(prof_event); // Read the output buffer back to the Host
char *buffer = new char[width * height * ];
size_t origin[] = { , , };
size_t region[] = { width, height, };
errNum = clEnqueueReadImage(commandQueue, imageObjects[], CL_TRUE,
origin, region, , , buffer,
, NULL, NULL);
if (errNum != CL_SUCCESS) {
std::cerr << "Error reading result buffer." << std::endl;
Cleanup(context, commandQueue, program, kernel, imageObjects);
system("pause") ; return ;
} std::cout << std::endl;
std::cout << "Executed program succesfully." << std::endl; // Save the image out to disk
char *saveImage = "output.jpg";
//std::cout << buffer << std::endl;
cv::Mat imageColor = cv::imread(src0);
cv::Mat imageColor2;
imageColor2.create(imageColor.rows, imageColor.cols, imageColor.type());
int w = ;
for (int v = imageColor2.rows-; v >=; v--) {
for (int u = ; u <imageColor2.cols; u++) {
imageColor2.at<cv::Vec3b>(v, u)[] = buffer[w++];
imageColor2.at<cv::Vec3b>(v, u)[] = buffer[w++];
imageColor2.at<cv::Vec3b>(v, u)[] = buffer[w++];
w++;
}
} cv::imshow("原始图像", imageColor);
cv::imshow("磨皮后", imageColor2);
cv::imwrite(saveImage, imageColor2);
cv::waitKey(); delete[] buffer; Cleanup(context, commandQueue, program, kernel, imageObjects); return ;
}
这个host端的程序包含了opencv的一点内容,主要是用opencv来读取图片,用其他方式读取图片当然也是可以的。实际上,opencv本身有一个ocl模块,貌似是由AMD给opencv做得OpenCL扩展,其中包括了许多用OpenCL实现的opencv的一些常用函数,其中就已经包括了双边滤波和自适应双边滤波。
这段程序选用了CL_DEVICE_MAX_WORK_GROUP_SIZE最大的显卡,最佳的OpenCL设备的选择应当综合考虑,在我的电脑上CL_DEVICE_MAX_WORK_GROUP_SIZE的CPU似乎就是最佳的OpenCL设备,虽然在实际获取的设备信息中CPU的许多参数比GPU强,但是实际运行的时长却是GPU的几倍,所以对于用哪些参数来判断一个OpenCL设备是最佳的我也不是很清楚,希望懂得朋友可以指导一二。
另外,这段程序其实是很简单的,实际有效的代码只有300多行,获取设备信息的代码只是为了看看自己的电脑上有哪些OpenCL设备以及相关的信息,main中的displayInfo();完全可以注释掉。
另外关于OpenCL库文件的获取,可以从intel,英伟达,AMD等获取到,我所使用的OpenCL的头文件和lib文件就是从英伟达的CUDA里面copy出来的,你也可以直接就是用我的。
4.运行结果
(1)硬件信息
(2)控制台输出OpenCL设备的信息
Number of platforms: 2
CL_PLATFORM_PROFILE: FULL_PROFILE
CL_PLATFORM_VERSION: OpenCL 2.0
CL_PLATFORM_VENDOR: Intel(R) Corporation
CL_PLATFORM_EXTENSIONS: cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
Number of devices: 2
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_VENDOR_ID: 32902
CL_DEVICE_MAX_COMPUTE_UNITS: 24
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 256 256 256
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 0
CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE: 0
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF: 1
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1050
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 390280806
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 128
CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE2D_MAX_HEIGHT: 16384
CL_DEVICE_IMAGE3D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE3D_MAX_HEIGHT: 16384
CL_DEVICE_IMAGE3D_MAX_DEPTH: 2048
CL_DEVICE_MAX_SAMPLERS: 16
CL_DEVICE_MAX_PARAMETER_SIZE: 1024
CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM | CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST | CL_FP_ROUND_TO_ZERO | CL_FP_ROUND_TO_INF
CL_DEVICE_SINGLE_FP_CONFIG: CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 64
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 524288
CL_DEVICE_GLOBAL_MEM_SIZE: 1561123226
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65536
CL_DEVICE_MAX_CONSTANT_ARGS: 8
CL_DEVICE_LOCAL_MEM_TYPE:
CL_DEVICE_LOCAL_MEM_SIZE: 65536
CL_DEVICE_ERROR_CORRECTION_SUPPORT: 0
CL_DEVICE_HOST_UNIFIED_MEMORY: 1
CL_DEVICE_PROFILING_TIMER_RESOLUTION: 83
CL_DEVICE_ENDIAN_LITTLE: 1
CL_DEVICE_AVAILABLE: 1
CL_DEVICE_COMPILER_AVAILABLE: 1
CL_DEVICE_EXECUTION_CAPABILITIES: CL_EXEC_KERNEL
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_PLATFORM: 00DEC488
CL_DEVICE_NAME: Intel(R) HD Graphics 520
CL_DEVICE_VENDOR: Intel(R) Corporation
CL_DRIVER_VERSION: 20.19.15.4364
CL_DEVICE_PROFILE: FULL_PROFILE
CL_DEVICE_VERSION: OpenCL 2.0
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 2.0
CL_DEVICE_EXTENSIONS: cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_ctz cl_intel_d3d11_nv12_media_sharing cl_intel_dx9_media_sharing cl_intel_motion_estimation cl_intel_simultaneous_sharing cl_intel_subgroups cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_fp16 cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_gl_sharing cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_khr_spirCL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
CL_DEVICE_VENDOR_ID: 32902
CL_DEVICE_MAX_COMPUTE_UNITS: 4
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 8192 8192 8192
CL_DEVICE_MAX_WORK_GROUP_SIZE: 8192
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF: 0
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR: 32
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT: 16
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT: 8
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG: 4
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT: 8
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE: 4
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF: 0
CL_DEVICE_MAX_CLOCK_FREQUENCY: 2500
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 536838144
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 480
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 480
CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE2D_MAX_HEIGHT: 16384
CL_DEVICE_IMAGE3D_MAX_WIDTH: 2048
CL_DEVICE_IMAGE3D_MAX_HEIGHT: 2048
CL_DEVICE_IMAGE3D_MAX_DEPTH: 2048
CL_DEVICE_MAX_SAMPLERS: 480
CL_DEVICE_MAX_PARAMETER_SIZE: 3840
CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM | CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST
CL_DEVICE_SINGLE_FP_CONFIG: CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 64
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 262144
CL_DEVICE_GLOBAL_MEM_SIZE: 2147352576
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 131072
CL_DEVICE_MAX_CONSTANT_ARGS: 480
CL_DEVICE_LOCAL_MEM_TYPE: CL_LOCAL | CL_GLOBAL
CL_DEVICE_LOCAL_MEM_SIZE: 32768
CL_DEVICE_ERROR_CORRECTION_SUPPORT: 0
CL_DEVICE_HOST_UNIFIED_MEMORY: 1
CL_DEVICE_PROFILING_TIMER_RESOLUTION: 395
CL_DEVICE_ENDIAN_LITTLE: 1
CL_DEVICE_AVAILABLE: 1
CL_DEVICE_COMPILER_AVAILABLE: 1
CL_DEVICE_EXECUTION_CAPABILITIES: CL_EXEC_KERNEL | CL_EXEC_NATIVE_KERNEL
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_PLATFORM: 00DEC488
CL_DEVICE_NAME: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
CL_DEVICE_VENDOR: Intel(R) Corporation
CL_DRIVER_VERSION: 5.2.0.10094
CL_DEVICE_PROFILE: FULL_PROFILE
CL_DEVICE_VERSION: OpenCL 2.0 (Build 10094)
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 2.0
CL_DEVICE_EXTENSIONS: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64 cl_khr_image2d_from_bufferCL_PLATFORM_PROFILE: FULL_PROFILE
CL_PLATFORM_VERSION: OpenCL 1.2 CUDA 8.0.44
CL_PLATFORM_VENDOR: NVIDIA Corporation
CL_PLATFORM_EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts
Number of devices: 1
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_VENDOR_ID: 4318
CL_DEVICE_MAX_COMPUTE_UNITS: 3
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 1024 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF: 0
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF: 0
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1241
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 536870912
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 256
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16
CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE2D_MAX_HEIGHT: 16384
CL_DEVICE_IMAGE3D_MAX_WIDTH: 4096
CL_DEVICE_IMAGE3D_MAX_HEIGHT: 4096
CL_DEVICE_IMAGE3D_MAX_DEPTH: 4096
CL_DEVICE_MAX_SAMPLERS: 32
CL_DEVICE_MAX_PARAMETER_SIZE: 4352
CL_DEVICE_MEM_BASE_ADDR_ALIGN: 4096
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM | CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST | CL_FP_ROUND_TO_ZERO | CL_FP_ROUND_TO_INF | CL_FP_FMA
CL_DEVICE_SINGLE_FP_CONFIG: CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 128
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 49152
CL_DEVICE_GLOBAL_MEM_SIZE: 2147483648
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65536
CL_DEVICE_MAX_CONSTANT_ARGS: 9
CL_DEVICE_LOCAL_MEM_TYPE:
CL_DEVICE_LOCAL_MEM_SIZE: 49152
CL_DEVICE_ERROR_CORRECTION_SUPPORT: 0
CL_DEVICE_HOST_UNIFIED_MEMORY: 0
CL_DEVICE_PROFILING_TIMER_RESOLUTION: 1000
CL_DEVICE_ENDIAN_LITTLE: 1
CL_DEVICE_AVAILABLE: 1
CL_DEVICE_COMPILER_AVAILABLE: 1
CL_DEVICE_EXECUTION_CAPABILITIES: CL_EXEC_KERNEL
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_PLATFORM: 00E30580
CL_DEVICE_NAME: GeForce 940MX
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 369.30
CL_DEVICE_PROFILE: FULL_PROFILE
CL_DEVICE_VERSION: OpenCL 1.2 CUDA
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2
CL_DEVICE_EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts平台数:2
Image dimensions: 273 x 415 pixels, Image type = CL_RGBA, CL_UNORM_INT8
Work Timer:3.422816msExecuted program succesfully.
273X415大小的图片用时不到4ms。
(3)双边滤波的效果
效果应该来说是很明显的。不过由于没有肤色检测和最后一步锐化,以及参数的设置等问题,连我朋友都说这个磨皮效果太嫩了,看着很假。所以在算法上我这个是有待完善的。
另外,在速度上,这个算法应该依然有优化的空间。
源码:http://download.csdn.net/download/qq_33892166/9761287
源码如果报错“Error queuing kernel for execution.”,尝试修改 size_t localWorkSize[2] = { 32, 32 }; 为 size_t localWorkSize[2] = { 16, 16 };
OpenCL双边滤波实现美颜功能的更多相关文章
- OpenCV计算机视觉学习(4)——图像平滑处理(均值滤波,高斯滤波,中值滤波,双边滤波)
如果需要处理的原图及代码,请移步小编的GitHub地址 传送门:请点击我 如果点击有误:https://github.com/LeBron-Jian/ComputerVisionPractice &q ...
- Atitit 图像处理 平滑 也称 模糊, 归一化块滤波、高斯滤波、中值滤波、双边滤波)
Atitit 图像处理 平滑 也称 模糊, 归一化块滤波.高斯滤波.中值滤波.双边滤波) 是一项简单且使用频率很高的图像处理方法 用途 去噪 去雾 各种线性滤波器对图像进行平滑处理,相关OpenC ...
- 利用联合双边滤波或引导滤波进行升采样(Upsampling)技术提高一些耗时算法的速度。
这十年来,在图像处理领域提出了很多新的图像分析和处理方法,包括是自动的以及一些需要有人工参与的,典型的比如stereo depth computations.image colorization.to ...
- 学习 opencv---(8)非线性滤波:中值滤波,双边滤波
正如我们上一篇文章中讲到的,线性滤波可以实现很多种不同的图像变换.然而非线性滤波,如中值滤波器和双边滤波器,有时可以达到更好的实现效果. 邻域算子的其他一些例子还有对 二值图像进行操作的形态学算子,用 ...
- o(1)复杂度之双边滤波算法的原理、流程、实现及效果。
一.引言 双边滤波在图像处理领域中有着广泛的应用,比如去噪.去马赛克.光流估计等等,最近,比较流行的Non-Local算法也可以看成是双边滤波的一种扩展.自从Tomasi et al等人提出该 ...
- OpenCv高斯,中值,均值,双边滤波
#include "cv.h" #include "highgui.h" #include <iostream> using namespace s ...
- Bilateral Filtering(双边滤波) for SSAO(转)
原文链接:http://blog.csdn.net/bugrunner/article/details/7170471 另外一篇相似的英文资料:http://homepages.inf.ed.ac.u ...
- [学习opencv]高斯、中值、均值、双边滤波
http://www.cnblogs.com/tiandsp/archive/2013/04/20/3031862.html [学习opencv]高斯.中值.均值.双边滤波 四种经典滤波算法,在ope ...
- 快速双边滤波 附完整C代码
很早之前写过<双边滤波算法的简易实现bilateralFilter>. 当时学习参考的代码来自cuda的样例. 相关代码可以参阅: https://github.com/johng12/c ...
随机推荐
- node+express+http-proxy-middleware做代理
最近,不赶着做项目,于是想着怎样做公司的前后端分离,这个时候想到了nodejs,于是打算今天做一个代理的demo,其实代码很简单,但是一直卡在一个地方,现在问题解决了,贴上代码和截图. html &l ...
- Mybatis中的#与$的区别
一.对比场景 场景:数据库分表时,需要将分表的表序号传入的sql中. SpringBoot中使用注解如下: @Insert("insert into collect_#{tblNum}(id ...
- $Python技巧大全
知乎上有一个问题:Python 有什么奇技淫巧?其中有各种不按套路出牌的招数,也不乏一些惊为天人的"奇技淫巧",会让你大呼:居然还有这种操作??? 本文就是对日常使用过的或者觉得很 ...
- Ubuntu16.04中查看硬盘的型号和读取速度
最近在测试FTP服务器,上传和下载的速度与很多因数有关,其中,硬盘的读取速度就是其中不同的区别点,我同时用了三台不用的服务器架设FTP服务,一台是出来ftp服务外还含平台其他管理软件,一台是全新的系统 ...
- 异常信息 Exception
void GetExceptionMsg(Exception ex, StringBuilder sb, string Prefix = "") { sb.AppendLine(P ...
- C++通过HTTP请求Get或Post方式请求Json数据(转)
原文网址:https://www.cnblogs.com/shike8080/articles/6549339.html #pragma once#include <iostream>#i ...
- OpenCV KNN 之 使用方法
http://blog.csdn.net/WL2002200/article/details/43149229 OpenCV 中KNN构造函数如下. C++: CvKNearest::CvKNeare ...
- 关于Android RenderScript 的详细说明和一些实用文档
http://www.cnblogs.com/TerryBlog/archive/2012/03/02/2377251.html RenderScript 是一种低级的高性能编程语言,用于3D渲染和处 ...
- Web前端学习笔记之jQuery选择器
JQuery过滤器 经过一晚上的查找整理,终于整理出一套应该算最全面的JQuery选择过滤器的方法了.所有代码均经过测试.首先HTML代码 HTML Code <html><head ...
- Linux安全基线检查脚本
基线检查内容: 一:共享账号检查 配置名称:用户账号分配检查,避免共享账号存在配置要求:1.系统需按照实际用户分配账号; 2.避免不同用户间共享账号,避免用户账号和服务器间通信使用的账号共享.操作指南 ...