A tiny program to benchmark image transpose algorithms
Here is the code:
- #include <stdio.h>
- #include <xmmintrin.h>
- #include <windows.h>
- typedef __m128 Vec;
- typedef unsigned long long value_t;
- __forceinline value_t now()
- {
- LARGE_INTEGER n;
- QueryPerformanceCounter(&n);
- return n.QuadPart;
- }
- inline void img_transpose(
- Vec *dst_img,
- Vec *src_img,
- const int src_w,
- const int src_h)
- {
- #pragma omp parallel for
- for (int j = ; j < src_w; ++j)
- {
- for (int i = ; i < src_h; ++i)
- {
- dst_img[j * src_h + i] = src_img[i * src_w + j];
- }
- }
- }
- inline void img_transpose_block(
- Vec *dst_img,
- Vec *src_img,
- const int src_w,
- const int src_h)
- {
- #pragma omp parallel for
- for (int j = ; j < src_w; j += )
- {
- for (int i = ; i < src_h; i += )
- {
- const int nsize = min(j + , src_w);
- const int msize = min(i + , src_h);
- for (int n = j; n < nsize; ++n)
- {
- for (int m = i; m < msize; ++m)
- {
- dst_img[n * src_h + m] = src_img[m * src_w + n];
- }
- }
- }
- }
- }
- int main(int argc, char *argv[])
- {
- //// performance benchmark ////
- const int w = ;
- const int h = ;
- Vec *a = new Vec [w * h];
- Vec *b = new Vec [w * h];
- value_t start_time, end_time;
- LARGE_INTEGER freq;
- QueryPerformanceFrequency(&freq);
- double ms_per_tick = 1000.0 / (double)freq.QuadPart;
- start_time = now();
- for (int t = ; t < ; ++t)
- {
- img_transpose(b, a, w, h);
- img_transpose(a, b, h, w);
- }
- end_time = now();
- printf("img_transpose: %f ms\n", (double)(end_time - start_time) * ms_per_tick);
- start_time = now();
- for (int t = ; t < ; ++t)
- {
- img_transpose_block(b, a, w, h);
- img_transpose_block(a, b, h, w);
- }
- end_time = now();
- printf("img_transpose_block: %f ms\n", (double)(end_time - start_time) * ms_per_tick);
- delete [] a;
- delete [] b;
- //// algorithm validation ////
- const int width = ;
- const int height = ;
- Vec *src_img = new Vec [width * height];
- Vec *dst_img = new Vec [height * width];
- for (int j = ; j < height; ++j)
- {
- for (int i = ; i < width; ++i)
- {
- src_img[j * width + i].m128_i32[] = i;
- src_img[j * width + i].m128_i32[] = j;
- }
- }
- img_transpose_block(dst_img, src_img, width, height);
- for (int j = ; j < width; ++j)
- {
- for (int i = ; i < height; ++i)
- {
- int pi = dst_img[j * height + i].m128_i32[];
- int pj = dst_img[j * height + i].m128_i32[];
- if (pi != j || pj != i)
- {
- printf("Algorithm is wrong!!!\n");
- goto END_OF_PROGRAM;
- }
- }
- }
- END_OF_PROGRAM:
- printf("All done\n");
- return ;
- }
A tiny program to benchmark image transpose algorithms的更多相关文章
- hey is a tiny program that sends some load to a web application.
hey is a tiny program that sends some load to a web application. DOS attack DOS攻击生成 https://github.c ...
- 自己动手写一个编译器Tiny语言解析器实现
然后,上一篇文章简介Tiny词法分析,实现语言.本文将介绍Tiny的语法分析器的实现. 1 Tiny语言的语法 下图是Tiny在BNF中的文法. 文法的定义能够看出.INNY语言有以下特点: 1 程序 ...
- Reading List on Automated Program Repair
Some resources: https://www.monperrus.net/martin/automatic-software-repair 2017 [ ] DeepFix: Fixing ...
- [io benchmark]常用磁盘基准/压力测试工具
Unix Disk I/O Benchmarks fio - NEW! fio is an I/O tool meant to be used both for benchmark and stres ...
- UVA - 10895 Matrix Transpose
UVA - 10895 Matrix Transpose Time Limit:3000MS Memory Limit:Unknown 64bit IO Format:%lld & % ...
- Awesome Go
A curated list of awesome Go frameworks, libraries and software. Inspired by awesome-python. Contrib ...
- Go 语言相关的优秀框架,库及软件列表
If you see a package or project here that is no longer maintained or is not a good fit, please submi ...
- Awesome Go (http://awesome-go.com/)
A curated list of awesome Go frameworks, libraries and software. Inspired by awesome-python. Contrib ...
- Awesome Go精选的Go框架,库和软件的精选清单.A curated list of awesome Go frameworks, libraries and software
Awesome Go financial support to Awesome Go A curated list of awesome Go frameworks, libraries a ...
随机推荐
- centos7下源码安装mysql5.7.16
一.下载源码包下载mysql源码包 http://mirrors.sohu.com/mysql/MySQL-5.7/mysql-5.7.16.tar.gz 二.安装约定: 用户名:mysql 安装目录 ...
- 表单提交的两种请求方式:post与get。post与get两者的对比分析
post与get两者的对比分析:
- Castle ActiveRecord学习(五)使用HQL语句查询
来源:http://www.cnblogs.com/Terrylee/archive/2006/04/12/372823.html 一.HQL简单介绍HQL全名是Hibernate Query Lan ...
- mvc EF 从数据库更新实体,添加视图实体时添加不上的问题
视图对象没有一列为非null的,解决办法,在视图中,将某一列排除为null的可能,比如:isnull(te,1),即可.
- workerman使用
1.start_timer.php(boc) <?php use \Workerman\Worker; use \Workerman\Lib\Timer; require_once '/var/ ...
- Git 初始状操作指引
You have an empty repository To get started you will need to run these commands in your terminal. Ne ...
- mvc模拟实现
.定义httpmodule <system.webServer> <modules> <add name="UrlRoutingModule" typ ...
- 使用delphi 开发多层应用(二十三)KbmMW 的WIB
解释WIB 是什么之前,先回顾以下我们前面的各种服务工作方式.前面的各种服务的工作方式都是请求/应答方式. 客户端发送请求,服务器端根据客户端的请求,返回相应的结果.这种方式是一种顺序式访问,是一种紧 ...
- 2018.10.13 bzo1934: [Shoi2007]Vote 善意的投票(最小割)
传送门 最小割定义题. 按照题意建边就行了. 考虑把冲突变成把aaa选入不与自己匹配的集合所需要付出的代价. 然后跑最小割就行了. 代码: #include<bits/stdc++.h> ...
- 2018.10.01 NOIP模拟 偷书(状压dp)
传送门 状压dp经典题. 令f[i][j]f[i][j]f[i][j]表示到第i个,第i−k+1i-k+1i−k+1~iii个物品的状态是j时的最大总和. 然后简单维护一下转移就行了. 由于想皮一下果 ...