CUDA C Programming Guide 在线教程学习笔记 Part 2

▶ 纹理内存使用

● 纹理内存使用有两套 API，称为 Object API 和 Reference API 。纹理对象（texture object）在运行时被 Object API 创建，同时指定了纹理单元。纹理引用（Tezture Reference）在编译时被 Reference API 创建，但是在运行时才指定纹理单元，并将纹理引用绑定到纹理单元上面去。

● 不同的纹理引用可能绑定到相同或内存上有重叠的的纹理单元上，纹理单元可能是 CUDA 线性内存或CUDA array 的任意部分。

● 可以定义一维到三维的数组作为纹理内存。数组中的元素简称 texel （texture element）。

● 纹理内存的数据类型可以是 char, int, long, long long, float, double，即所有基本的占用1、2、4字节的数据类型。
● 纹理内存的访问模式有 cudaReadModeNormalizedFloat 和 cudaReadModeElementType 两种。前者读取 4 字节整数时会除以 0x8fff（有符号整数）或 0xffff（无符号整数），从而把值线性映射到 [-1.0, 1.0] 区间（有符号整数）或 [0, 1] 区间（无符号整数），读取 2 字节整数时也会发生类似变换，除以 0x8f 或 0xff 。后者则不会发生这种转换。

● 纹理数组使用浮点坐标进行引用。长为 N 的一维数组，默认纹理坐标中，下标范围是 [0.0, N-1] 之间的浮点数；正规化纹理坐标中，下标范围是 [0.0, 1-1/N] 之间的浮点数。二维或三维数组每一维上的坐标也遵循这个原则。

● 寻址模式，可以对数组范围以外的坐标进行访问（越界访问），不同的寻址模式定义了这种操作的效果。默认寻址模式 cudaAddressModeClamp 下，越界访问取各维上的边界值；边界模式 cudaAddressModeBorder 下，越界访问会返回 0 。使用正规化坐标时，可以选用束模式和镜像模式，束模式 cudaAddressModeWrap（想象成左右边界相连）下，越界坐标做变换 x' = x - floor(x) （教程上少了减号）；镜像模式 cudaAddressModeMirror（想象成左 ~ 右 → 右 ~ 左 → 左 ~ 右）下，越界坐标做变换 x' = x（floor(x) 为偶数）或 x' = 1 - x（floor(x) 为奇数）。

● 滤波模式，决定了如何把整数坐标的数组数据值转化为浮点坐标的引用值。最临近插值 cudaFilterModePoint 使用最接近访问坐标的整数坐标点数据，可以返回整数值（若纹理数组本身是整数型）；线性插值 cudaFilterModeLinear 使用每维度上最接近访问坐标的两个整数坐标点数据进行插值，可以单线性（一维，2 点）、双线性（二维，4 点）和三线性（三维，8 点），只能返回浮点数值。

● 使用 Texture Object API 。

■ 涉及的结构定义、接口函数。

 // texture_types.h

 struct __device_builtin__ cudaTextureDesc

 {

     enum cudaTextureAddressMode addressMode[];     // 寻址模式，cudaResourceDesc::resType == cudaResourceTypeLinear 时无效

     enum cudaTextureFilterMode  filterMode;         // 滤波模式，cudaResourceDesc::resType == cudaResourceTypeLinear 时无效

     enum cudaTextureReadMode    readMode;           // 访问模式

     int                         sRGB;               // ？读取时将sRGB范围正规化

     float                       borderColor[];     // ？文理边界颜色

     int                         normalizedCoords;   // 是否使用正规化坐标

     unsigned int                maxAnisotropy;      // ？

     enum cudaTextureFilterMode  mipmapFilterMode;   // ？

     float                       mipmapLevelBias;    // ？

     float                       minMipmapLevelClamp;// ？

     float                       maxMipmapLevelClamp;// ？

 };

 enum __device_builtin__ cudaTextureAddressMode

 {

     cudaAddressModeWrap = ,

     cudaAddressModeClamp = ,

     cudaAddressModeMirror = ,

     cudaAddressModeBorder =

 };

 enum __device_builtin__ cudaTextureFilterMode

 {

     cudaFilterModePoint = ,

     cudaFilterModeLinear =

 };

 enum __device_builtin__ cudaTextureReadMode

 {

     cudaReadModeElementType = ,

     cudaReadModeNormalizedFloat =

 };

 typedef __device_builtin__ unsigned long long cudaTextureObject_t;

 // driver_types.h

 enum __device_builtin__ cudaChannelFormatKind

 {

     cudaChannelFormatKindSigned = ,    // 有符号整数模式

     cudaChannelFormatKindUnsigned = ,  // 无符号整数模式

     cudaChannelFormatKindFloat = ,     // 浮点模式

     cudaChannelFormatKindNone =        // 无通道模式

 };

 struct __device_builtin__ cudaChannelFormatDesc

 {

     int                        x;   // 通道 0 数据位深度

     int                        y;   // 通道 1 数据位深度

     int                        z;   // 通道 2 数据位深度

     int                        w;   // ？

     enum cudaChannelFormatKind f;   // 通道模式

 };

 typedef struct cudaArray *cudaArray_t;

 typedef struct cudaMipmappedArray *cudaMipmappedArray_t;

 enum __device_builtin__ cudaResourceType

 {

     cudaResourceTypeArray = 0x00,           // 数组资源

     cudaResourceTypeMipmappedArray = 0x01,  // 映射数组资源

     cudaResourceTypeLinear = 0x02,          // 线性资源

     cudaResourceTypePitch2D = 0x03          // 对齐二维资源

 };

 struct __device_builtin__ cudaResourceDesc

 {

     enum cudaResourceType resType;              // 资源类型

     union res

     {

         struct array                            // cuda数组

         {

             cudaArray_t array;

         };

         struct mipmap                           // mipmap 数组

         {

             cudaMipmappedArray_t mipmap;

         };

         struct linear                           // 一维数组

         {

             void *devPtr;                       // 设备指针，符合 cudaDeviceProp::textureAlignment 的对齐要求

             struct cudaChannelFormatDesc desc;  // texel 的属性描述

             size_t sizeInBytes;                 // 数组字节数

         };

         struct pitch2D                          // 二位数组

         {

             void *devPtr;                       // 设备指针，符合 cudaDeviceProp::textureAlignment 的对齐要求

             struct cudaChannelFormatDesc desc;  // texel 的属性描述

             size_t width;                       // 数组列数

             size_t height;                      // 数组行数

             size_t pitchInBytes;                // 数组行字节数

         };

     };

 };

 // cuda_runtime_api.h

 extern __host__ struct cudaChannelFormatDesc CUDARTAPI cudaCreateChannelDesc(int x, int y, int z, int w, enum cudaChannelFormatKind f);

 extern __host__ cudaError_t CUDARTAPI cudaMallocArray(cudaArray_t *array, const struct cudaChannelFormatDesc *desc, size_t width, size_t height __dv(), unsigned int flags __dv());

 extern __host__ cudaError_t CUDARTAPI cudaMemcpyToArray(cudaArray_t dst, size_t wOffset, size_t hOffset, const void *src, size_t count, enum cudaMemcpyKind kind);

 extern __host__ cudaError_t CUDARTAPI cudaCreateTextureObject(cudaTextureObject_t *pTexObject, const struct cudaResourceDesc *pResDesc, const struct cudaTextureDesc *pTexDesc, const struct cudaResourceViewDesc *pResViewDesc);

 extern __host__ cudaError_t CUDARTAPI cudaDestroyTextureObject(cudaTextureObject_t texObject);

■ 完整的应用样例代码。初始化一个 32×32 的矩阵，利用纹理对其进行平移和旋转，输出调整之后的矩阵。

 #include <stdio.h>

 #include <stdlib.h>

 #include <malloc.h>

 #include <cuda_runtime_api.h>

 #include "device_launch_parameters.h"

 #define DEGRE_TO_RADIAN(x) ((x) * 3.1416f / 180)

 #define CEIL(x,y) (((x) + (y) - 1) / (y) + 1)

 // 简单的线性变换

 __global__ void transformKernel(float* output, cudaTextureObject_t texObj, int width, int height, float theta)

 {

     // 计算正规化纹理坐标

     unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x;

     unsigned int idy = blockIdx.y * blockDim.y + threadIdx.y;

     // 正规化和平移

     float u = idx / (float)width - 0.5f;

     float v = idy / (float)height - 0.5f;

     // 旋转

     float tu = u * __cosf(theta) - v * __sinf(theta) + 0.5f;

     float tv = v * __cosf(theta) + u * __sinf(theta) + 0.5f;

     //printf("\n(%2d,%2d,%2d,%2d)->(%f,%f,%f)",

     //    blockIdx.x, blockIdx.y, threadIdx.x, threadIdx.y, tu, tv,tex2D<float>(texObj, tu, tv));

     // 纹理内存写入全局内存

     output[idy * width + idx] = tex2D<float>(texObj, tu, tv);

 }

 int main()

 {

     // 基本数据

     int i;

     float *h_data, *d_data;

     int width = ;

     int height = ;

     float angle = DEGRE_TO_RADIAN();

     int size = sizeof(float)*width*height;

     h_data = (float *)malloc(size);

     cudaMalloc((void **)&d_data, size);

     for (i = ; i < width*height; i++)

         h_data[i] = (float)i;

     printf("\n\n");

     for (i = ; i < width*height; i++)

     {

         printf("%6.1f ", h_data[i]);

         if ((i + ) % width == )

             printf("\n");

     }

     // 申请 cuda 数组并拷贝数据

     cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(, , , ,cudaChannelFormatKindFloat);

     cudaArray* cuArray;

     cudaMallocArray(&cuArray, &channelDesc, width, height);

     cudaMemcpyToArray(cuArray, , , h_data, size, cudaMemcpyHostToDevice);

     // 指定纹理资源

     struct cudaResourceDesc resDesc;

     memset(&resDesc, , sizeof(resDesc));

     resDesc.resType = cudaResourceTypeArray;

     resDesc.res.array.array = cuArray;

     // 指定纹理对象参数

     struct cudaTextureDesc texDesc;

     memset(&texDesc, , sizeof(texDesc));

     texDesc.addressMode[] = cudaAddressModeWrap;

     texDesc.addressMode[] = cudaAddressModeWrap;

     texDesc.filterMode = cudaFilterModeLinear;

     texDesc.readMode = cudaReadModeElementType;

     texDesc.normalizedCoords = ;

     // 创建文理对象

     cudaTextureObject_t texObj = ;

     cudaCreateTextureObject(&texObj, &resDesc, &texDesc, NULL);

     // 运行核函数

     dim3 dimBlock(, );

     dim3 dimGrid(CEIL(width, dimBlock.x), CEIL(height, dimBlock.y));

     transformKernel << <dimGrid, dimBlock >> > (d_data, texObj, width, height, angle);

     cudaDeviceSynchronize();

     // 结果回收和检查结果

     cudaMemcpy(h_data, d_data, size, cudaMemcpyDeviceToHost);

     printf("\n\n");

     for (i = ; i < width*height; i++)

     {

         printf("%6.1f ", h_data[i]);

         if ((i + ) % width == )

             printf("\n");

     }

     // 回收工作

     cudaDestroyTextureObject(texObj);

     cudaFreeArray(cuArray);

     cudaFree(d_data);

     getchar();

     return ;

 }

● 使用 Texture Reference API。

■ 纹理引用的一些只读属性需要在声明的时候指定，以便编译时提前确定，只能在全局作用域内静态指定，不能作为参数传递给函数。使用 texture 指定纹理引用属性，Datatype 为 texel 的数据类型，Type 为纹理引用类型，有 7 种，默认 cudaTextureType1D，ReadMode 为访问类型，默认 cudaReadModeElementType ，其他属性可以在主机运行时动态的修改。

 texture<DataType, Type, ReadMode> texRef;

 // cuda_texture_types.h

 template<class T, int texType = cudaTextureType1D, enum cudaTextureReadMode mode = cudaReadModeElementType>

 struct __device_builtin_texture_type__ texture : public textureReference

 {

 #if !defined(__CUDACC_RTC__)

     __host__ texture(int norm = , enum cudaTextureFilterMode  fMode = cudaFilterModePoint, enum cudaTextureAddressMode aMode = cudaAddressModeClamp)

     {

         normalized = norm;

         filterMode = fMode;

         addressMode[] = aMode;

         addressMode[] = aMode;

         addressMode[] = aMode;

         channelDesc = cudaCreateChannelDesc<T>();

         sRGB = ;

     }

     __host__ texture(int norm, enum cudaTextureFilterMode   fMode, enum cudaTextureAddressMode  aMode, struct cudaChannelFormatDesc desc)

     {

         normalized = norm;

         filterMode = fMode;

         addressMode[] = aMode;

         addressMode[] = aMode;

         addressMode[] = aMode;

         channelDesc = desc;

         sRGB = ;

     }

 #endif

 };

 //texture_types.h

 #define cudaTextureType1D              0x01

 #define cudaTextureType2D              0x02

 #define cudaTextureType3D              0x03

 #define cudaTextureTypeCubemap         0x0C

 #define cudaTextureType1DLayered       0xF1

 #define cudaTextureType2DLayered       0xF2

 #define cudaTextureTypeCubemapLayered  0xFC

■ 涉及的结构定义、接口函数。纹理引用必须用函数 cudaBindTexture() 或 cudaBindTexture2D() 或 cudaBindTextureToArray() 绑定到相应维度的数组上才能使用，要求纹理引用的维度、数据类型与该数组匹配，否则操作时未定义的，使用完后还要用函数 cudaUnbindTexture() 解除绑定。

 // texture_types.h

 struct __device_builtin__ textureReference

 {

     int                          normalized;            // 是否使用正规化坐标

     enum cudaTextureFilterMode   filterMode;            // 滤波模式

     enum cudaTextureAddressMode  addressMode[];        // 寻址模式

     struct cudaChannelFormatDesc channelDesc;           // texel 的格式，其元素数据类型与声明 texture 时的 Datatype 一致

     int                          sRGB;                  // ？读取时将sRGB范围正规化

     unsigned int                 maxAnisotropy;         // ？

     enum cudaTextureFilterMode   mipmapFilterMode;      // ？

     float                        mipmapLevelBias;       // ？

     float                        minMipmapLevelClamp;   // ？

     float                        maxMipmapLevelClamp;   // ？

     int                          __cudaReserved[];    // ？

 };

 // cuda_runtime_api.h

 extern __host__ cudaError_t CUDARTAPI cudaBindTexture(size_t *offset, const struct textureReference *texref, const void *devPtr, const struct cudaChannelFormatDesc *desc, size_t size __dv(UINT_MAX));

 extern __host__ cudaError_t CUDARTAPI cudaBindTexture2D(size_t *offset, const struct textureReference *texref, const void *devPtr, const struct cudaChannelFormatDesc *desc, size_t width, size_t height, size_t pitch);

 extern __host__ cudaError_t CUDARTAPI cudaBindTextureToArray(const struct textureReference *texref, cudaArray_const_t array, const struct cudaChannelFormatDesc *desc);

 extern __host__ cudaError_t CUDARTAPI cudaUnbindTexture(const struct textureReference *texref);

 extern __host__ cudaError_t CUDARTAPI cudaGetTextureReference(const struct textureReference **texref, const void *symbol);

■ 将 2D 纹理引用绑定到 2D 数组上的范例代码

 // 准备工作

 texture<float, cudaTextureType2D, cudaReadModeElementType> texRef;

 ...

 int width, height;

 size_t pitch;

 float *d_data;

 cudaMallocPitch((void **)&d_data, &pitch, sizeof(float)*width, height);

 // 第一种方法，低层 API

 textureReference* texRefPtr;

 cudaGetTextureReference(&texRefPtr, &texRef);

 cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();

 size_t offset;

 cudaBindTexture2D(&offset, texRefPtr, d_data, &channelDesc, width, height, pitch);

 // 第二种方法，高层 API

 cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();

 size_t offset;

 cudaBindTexture2D(&offset, texRef, d_data, channelDesc, width, height, pitch);

■ 将 2D 纹理引用绑定到 cuda 数组上的范例代码

 // 准备工作

 texture<float, cudaTextureType2D, cudaReadModeElementType> texRef;

 //...

 cudaArray* cuArray;

 cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(, , , , cudaChannelFormatKindFloat);

 cudaMallocArray(&cuArray, &channelDesc, width, height);

 // 第一种方法，低层 API

 textureReference* texRefPtr;

 cudaGetTextureReference(&texRefPtr, &texRef);

 memset(&channelDesc, , sizeof(cudaChannelFormatDesc));

 cudaChannelFormatDesc channelDesc;

 cudaGetChannelDesc(&channelDesc, cuArray);

 cudaBindTextureToArray(texRef, cuArray, &channelDesc);

 // 第二种方法，高层 API

 cudaBindTextureToArray(texRef, cuArray);

■ 完整的应用样例代码。与前面纹理对象代码的功能相同。

 #include <stdio.h>

 #include <stdlib.h>

 #include <malloc.h>

 #include <cuda_runtime_api.h>

 #include "device_launch_parameters.h"

 #define DEGRE_TO_RADIAN(x) ((x) * 3.1416f / 180)

 #define CEIL(x,y) (((x) + (y) - 1) / (y) + 1)

 // 声明纹理引用

 texture<float, cudaTextureType2D, cudaReadModeElementType> texRef;

 // 简单的线性变换

 __global__ void transformKernel(float* output, int width, int height, float theta)

 {

     // 计算正规化纹理坐标

     unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x;

     unsigned int idy = blockIdx.y * blockDim.y + threadIdx.y;

     // 正规化和平移

     float u = idx / (float)width;

     float v = idy / (float)height;

     // 旋转

     float tu = u * __cosf(theta) - v * __sinf(theta) + 0.5f;

     float tv = v * __cosf(theta) + u * __sinf(theta) + 0.5f;

     //printf("\n(%2d,%2d,%2d,%2d)->(%f,%f,%f)",

     //    blockIdx.x, blockIdx.y, threadIdx.x, threadIdx.y, tu, tv,tex2D<float>(texObj, tu, tv));

     // 纹理内存写入全局内存

     output[idy * width + idx] = tex2D(texRef, tu, tv);

 }

 int main()

 {

     // 基本数据

     int i;

     float *h_data, *d_data;

     int width = ;

     int height = ;

     float angle = DEGRE_TO_RADIAN();

     int size = sizeof(float)*width*height;

     h_data = (float *)malloc(size);

     cudaMalloc((void **)&d_data, size);

     for (i = ; i < width*height; i++)

         h_data[i] = (float)i;

     printf("\n\n");

     for (i = ; i < width*height; i++)

     {

         printf("%6.1f ", h_data[i]);

         if ((i + ) % width == )

             printf("\n");

     }

     // 申请 cuda 数组并拷贝数据

     cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(, , , , cudaChannelFormatKindFloat);

     cudaArray* cuArray;

     cudaMallocArray(&cuArray, &channelDesc, width, height);

     cudaMemcpyToArray(cuArray, , , h_data, size, cudaMemcpyHostToDevice);

     // 指定纹理引用参数，注意与纹理对象的使用不一样

     texRef.addressMode[] = cudaAddressModeWrap;

     texRef.addressMode[] = cudaAddressModeWrap;

     texRef.filterMode = cudaFilterModeLinear;

     texRef.normalized = ;

     // 绑定纹理引用

     cudaBindTextureToArray(texRef, cuArray, channelDesc);

     // 运行核函数

     dim3 dimBlock(, );

     dim3 dimGrid(CEIL(width, dimBlock.x), CEIL(height, dimBlock.y));

     transformKernel << <dimGrid, dimBlock >> > (d_data, width, height, angle);

     cudaDeviceSynchronize();

     // 结果回收和检查结果

     cudaMemcpy(h_data, d_data, size, cudaMemcpyDeviceToHost);

     printf("\n\n");

     for (i = ; i < width*height; i++)

     {

         printf("%6.1f ", h_data[i]);

         if ((i + ) % width == )

             printf("\n");

     }

     // 回收工作

     cudaFreeArray(cuArray);

     cudaFree(d_data);

     getchar();

     return ;

 }

▶ 半精度浮点数。

■ CUDA 没有原生支持半精度浮点数据类型，可以把半精度数据存储在 short 数据类型中，在需要计算的时候用内建函数将其与浮点类型进行转换。

■ 这些函数只能在设备代码中使用，可以在 OpenEXR 库中找到其等价的函数。

■ 在纹理计算过程中，半精度浮点数默认转化为单精度浮点数。

 // cuda_fp16.h

 __CUDA_FP16_DECL__ float __half2float(const __half h)

 {

     float val;

     asm volatile("{  cvt.f32.f16 %0, %1;}\n" : "=f"(val) : "h"(h.x));

     return val;

 }

 __CUDA_FP16_DECL__ __half2 __float2half2_rn(const float f)

 {

     __half2 val;

     asm("{.reg .f16 low;\n"

         "  cvt.rn.f16.f32 low, %1;\n"

         "  mov.b32 %0, {low,low};}\n" : "=r"(val.x) : "f"(f));

     return val;

 }

▶ 分层纹理 Layered Texture

● 在 Direct3D 中叫 texture array，在 OpenGL 中叫 array texture 。

● 一组分层纹理是由若干相同维度、尺寸和数据类型的的纹理内存构成的，相当于多了一个整数下标的维度。支持一维分层纹理和二维分层纹理。

● 一维分层纹理使用一个整数和一个浮点数作为坐标进行访问；二维分层纹理使用一个整数和两个浮点数作为坐标进行访问。

● 分层纹理只能使用函数 cudaMalloc3DArray() 加上 cudaArrayLayered 标志来声明，使用函数 tex1DLayered() 和 tex2DLayered() 来进行访问。滤波只在同一层内部进行，不会跨层执行。

● 分层纹理在书《CUDA专家手册》中讲的稍微详细一点，看完再来填坑。

▶ 立方体贴图纹理 Cubemap Textures

● 一种特殊的二维分层纹理，共六个尺寸相同的、宽度等于高度的二维纹理构成，代表了正方体的六个面。

● 使用三个浮点数的有序组 (x, y, z) 来定义立方体贴图纹理的层号和表面坐标，按照以下表格分情况讨论。各表面坐标按照（(s / m + 1) / 2, (t / m + 1) / 2）计算。

● 立方体贴图纹理只能使用函数 cudaMalloc3DArray() 加上 cudaArrayCubemap 标志来声明，使用函数 texCubemap() 来访问。

▶ 分层立方体贴图纹理 Cubemap Layered Textures

● 一种分层纹理内存，由若干尺寸相同的立方体贴图纹理构成。使用一个整数下标和三个浮点数有序组来定义层号和面号、表面坐标。

● 分层立方体贴图纹理只能使用函数 cudaMAlloc3DArray() 加上 cudaArrayLayered 和 cudaArrayCubemap 标志来声明，使用函数 texCubemapLayered() 来进行访问滤波只在同一层内部进行，不会跨层执行。

▶ 纹理汇集。

● 使用函数 tex2Dgather() 来抽取二维纹理内存的特定内容，没看懂。

 // texture_fetch_functions.h

 template <typename T>

 static __device__ typename __nv_tex2dgather_ret<T>::type tex2Dgather(texture<T, cudaTextureType2D, cudaReadModeElementType>, float, float, int = ) {  }

▶ 压缩版的 texture_types.h。所有内容在本文中都有体现。

 #if !defined(__TEXTURE_TYPES_H__)

 #define __TEXTURE_TYPES_H__

 #include "driver_types.h"

 #define cudaTextureType1D              0x01

 #define cudaTextureType2D              0x02

 #define cudaTextureType3D              0x03

 #define cudaTextureTypeCubemap         0x0C

 #define cudaTextureType1DLayered       0xF1

 #define cudaTextureType2DLayered       0xF2

 #define cudaTextureTypeCubemapLayered  0xFC

 // CUDA texture address modes

 enum __device_builtin__ cudaTextureAddressMode

 {

     cudaAddressModeWrap   = ,    // Wrapping address mode

     cudaAddressModeClamp  = ,    // Clamp to edge address mode

     cudaAddressModeMirror = ,    // Mirror address mode

     cudaAddressModeBorder =      // Border address mode

 };

 // CUDA texture filter modes

 enum __device_builtin__ cudaTextureFilterMode

 {

     cudaFilterModePoint  = ,     // Point filter mode

     cudaFilterModeLinear =       // Linear filter mode

 };

 // CUDA texture read modes

 enum __device_builtin__ cudaTextureReadMode

 {

     cudaReadModeElementType     = ,  // Read texture as specified element type

     cudaReadModeNormalizedFloat =    // Read texture as normalized float

 };

 // CUDA texture reference

 struct __device_builtin__ textureReference

 {

     // Indicates whether texture reads are normalized or not

     int                          normalized;

     // Texture filter mode

     enum cudaTextureFilterMode   filterMode;

     // Texture address mode for up to 3 dimensions

     enum cudaTextureAddressMode  addressMode[];

     // Channel descriptor for the texture reference

     struct cudaChannelFormatDesc channelDesc;

     // Perform sRGB->linear conversion during texture read

     int                          sRGB;

     // Limit to the anisotropy ratio

     unsigned int                 maxAnisotropy;

     // Mipmap filter mode

     enum cudaTextureFilterMode   mipmapFilterMode;

     // Offset applied to the supplied mipmap level

     float                        mipmapLevelBias;

     // Lower end of the mipmap level range to clamp access to

     float                        minMipmapLevelClamp;

     // Upper end of the mipmap level range to clamp access to

     float                        maxMipmapLevelClamp;

     int                          __cudaReserved[];

 };

 // CUDA texture descriptor

 struct __device_builtin__ cudaTextureDesc

 {

     // Texture address mode for up to 3 dimensions

     enum cudaTextureAddressMode addressMode[];

     // Texture filter mode

     enum cudaTextureFilterMode  filterMode;

     // Texture read mode

     enum cudaTextureReadMode    readMode;

     // Perform sRGB->linear conversion during texture read

     int                         sRGB;

     // Texture Border Color

     float                       borderColor[];

     // Indicates whether texture reads are normalized or not

     int                         normalizedCoords;

     // Limit to the anisotropy ratio

     unsigned int                maxAnisotropy;

     // Mipmap filter mode

     enum cudaTextureFilterMode  mipmapFilterMode;

     // Offset applied to the supplied mipmap level

     float                       mipmapLevelBias;

     // Lower end of the mipmap level range to clamp access to

     float                       minMipmapLevelClamp;

     // Upper end of the mipmap level range to clamp access to

     float                       maxMipmapLevelClamp;

 };

 // An opaque value that represents a CUDA texture object

 typedef __device_builtin__ unsigned long long cudaTextureObject_t;

 #endif

CUDA C Programming Guide 在线教程学习笔记 Part 2的更多相关文章

CUDA C Programming Guide 在线教程学习笔记 Part 5
附录 A,CUDA计算设备附录 B,C语言扩展 ▶ 函数的标识符 ● __device__,__global__ 和 __host__ ● 宏 __CUDA_ARCH__ 可用于区分代码的运行位置. ...
CUDA C Programming Guide 在线教程学习笔记 Part 4
▶ 图形互操作性,OpenGL 与 Direct3D 相关.(没学过,等待填坑) ▶ 版本号与计算能力 ● 计算能力(Compute Capability)表征了硬件规格,CUDA版本号表征了驱动接口 ...
CUDA C Programming Guide 在线教程学习笔记 Part 10【坑】
▶ 动态并行. ● 动态并行直接从 GPU 上创建工作,可以减少主机和设备间数据传输,在设备线程中调整配置.有数据依赖的并行工作可以在内核运行时生成,并利用 GPU 的硬件调度和负载均衡.动态并行要求 ...
CUDA C Programming Guide 在线教程学习笔记 Part 13
▶ 纹理内存访问补充(见纹理内存博客 http://www.cnblogs.com/cuancuancuanhao/p/7809713.html) ▶ 计算能力 ● 不同计算能力的硬件对计算特性的支持 ...
CUDA C Programming Guide 在线教程学习笔记 Part 9
▶ 协作组,要求 cuda ≥ 9.0,一个简单的例子见 http://www.cnblogs.com/cuancuancuanhao/p/7881093.html ● 灵活调节需要进行通讯的线程组合 ...
CUDA C Programming Guide 在线教程学习笔记 Part 8
▶ 线程束表决函数(Warp Vote Functions) ● 用于同一线程束内各线程通信和计算规约指标. // device_functions.h,cc < 9.0 __DEVICE_FU ...
CUDA C Programming Guide 在线教程学习笔记 Part 7
▶ 可缓存只读操作(Read-Only Data Cache Load Function),定义在 sm_32_intrinsics.hpp 中.从地址 adress 读取类型为 T 的函数返回,T ...
CUDA C Programming Guide 在线教程学习笔记 Part 3
▶ 表面内存使用 ● 创建 cuda 数组时使用标志 cudaArraySurfaceLoadStore 来创建表面内存,可以用表面对象(surface object)或表面引用(surface re ...
CUDA C Programming Guide 在线教程学习笔记 Part 1
1. 简介 2. 编程模型 ▶ SM version 指的是硬件构架和特性,CUDA version 指的是软件平台版本. 3. 编程接口.参考 http://chenrudan.github.io/ ...

随机推荐

第三周作业3——Bug Report
作业要求来自:https://edu.cnblogs.com/campus/nenu/SWE2017FALL/homework/957 要求1: 准备工作:利用老师提供的git 命令,批量pull所有 ...
最优比率生成树 POJ 2728 迭代或者二分
别人解题报告的链接: http://blog.sina.com.cn/s/blog_691190870101626q.html 说明一下关于精度的问题,当结果是精确到小数点后3为,你自然要把误差定为至 ...
迭代器Iterator的底层实现原理
第一步:没有接口的迭代器简单实现原理 package com.bjsxt.xiaofei; /** * 迭代器底层原理 * 方法: * hasNext() * next() * remove() * ...
WPF控件NumericUpDown (转)
WPF控件NumericUpDown示例 (转载请注明出处) 工具:Expression Blend 2 + Visual Studio 2008 语言:C# 框架:.Net Framework 3. ...
Oracle:Decode在时间范围中的使用
做查询的时候需要下一个sql,需要select test_time出来,如果test_Time的HH24:Mi:SS在7:00:00和19:00:00返回白班,否则返回夜班 select case w ...
C# 使用oledb 方式连接本地或者远程oracel 数据库的方式
对于C# 进行oracle 数据库的开发来说使用oracle 提供的odp.net 方式是比较方便的,同时在性能以及兼容性也是比较好的但是,对于不打算使用的,那么该如何使用oledb 进行连接连接 ...
【转】每天一个linux命令（31）: /etc/group文件详解
原文网址:http://www.cnblogs.com/peida/archive/2012/12/05/2802419.html Linux /etc/group文件与/etc/passwd和/et ...
bzoj 4556 [Tjoi2016&Heoi2016]字符串——后缀数组+主席树
题目:https://www.lydsy.com/JudgeOnline/problem.php?id=4556 本来只要查 ht[ ] 数组上的前驱和后继就行,但有长度的限制.可以二分答案解决!然后 ...
vue2.0 不引用第三方包的情况下实现嵌套对象的拖拽排序功能
先上一张效果图,然后再上代码(由于只做效果,未做数据相关的处理:实际处理数据时不修改 dom 元素,只是利用 dom 元素传递数据,然后需改数据,靠数据驱动效果) <div :id=" ...
Delphi调用网页美化SQL
百度搜索在线美化SQL语句的网站,为了加快解析速度,这里已下载到本地. 然后delphi用webbrowse载入本地的网页,然后把sql传进去,美化后取出来. 效果如下图点击下载源码

CUDA C Programming Guide 在线教程学习笔记 Part 2

CUDA C Programming Guide 在线教程学习笔记 Part 2的更多相关文章

随机推荐

热门专题