Here is the code:

#include <stdio.h>
#include <xmmintrin.h>
#include <windows.h> typedef __m128 Vec; typedef unsigned long long value_t; __forceinline value_t now()
return n.QuadPart;
} inline void img_transpose(
Vec *dst_img,
Vec *src_img,
const int src_w,
const int src_h)
#pragma omp parallel for
for (int j = ; j < src_w; ++j)
for (int i = ; i < src_h; ++i)
dst_img[j * src_h + i] = src_img[i * src_w + j];
} inline void img_transpose_block(
Vec *dst_img,
Vec *src_img,
const int src_w,
const int src_h)
#pragma omp parallel for
for (int j = ; j < src_w; j += )
for (int i = ; i < src_h; i += )
const int nsize = min(j + , src_w);
const int msize = min(i + , src_h); for (int n = j; n < nsize; ++n)
for (int m = i; m < msize; ++m)
dst_img[n * src_h + m] = src_img[m * src_w + n];
} int main(int argc, char *argv[])
//// performance benchmark //// const int w = ;
const int h = ;
Vec *a = new Vec [w * h];
Vec *b = new Vec [w * h];
value_t start_time, end_time; LARGE_INTEGER freq;
double ms_per_tick = 1000.0 / (double)freq.QuadPart; start_time = now(); for (int t = ; t < ; ++t)
img_transpose(b, a, w, h);
img_transpose(a, b, h, w);
} end_time = now();
printf("img_transpose: %f ms\n", (double)(end_time - start_time) * ms_per_tick); start_time = now(); for (int t = ; t < ; ++t)
img_transpose_block(b, a, w, h);
img_transpose_block(a, b, h, w);
} end_time = now();
printf("img_transpose_block: %f ms\n", (double)(end_time - start_time) * ms_per_tick); delete [] a;
delete [] b; //// algorithm validation ////
const int width = ;
const int height = ;
Vec *src_img = new Vec [width * height];
Vec *dst_img = new Vec [height * width]; for (int j = ; j < height; ++j)
for (int i = ; i < width; ++i)
src_img[j * width + i].m128_i32[] = i;
src_img[j * width + i].m128_i32[] = j;
} img_transpose_block(dst_img, src_img, width, height); for (int j = ; j < width; ++j)
for (int i = ; i < height; ++i)
int pi = dst_img[j * height + i].m128_i32[];
int pj = dst_img[j * height + i].m128_i32[]; if (pi != j || pj != i)
printf("Algorithm is wrong!!!\n");
printf("All done\n"); return ;

A tiny program to benchmark image transpose algorithms的更多相关文章

  1. hey is a tiny program that sends some load to a web application.

    hey is a tiny program that sends some load to a web application. DOS attack DOS攻击生成 https://github.c ...

  2. 自己动手写一个编译器Tiny语言解析器实现

    然后,上一篇文章简介Tiny词法分析,实现语言.本文将介绍Tiny的语法分析器的实现. 1 Tiny语言的语法 下图是Tiny在BNF中的文法. 文法的定义能够看出.INNY语言有以下特点: 1 程序 ...

  3. Reading List on Automated Program Repair

    Some resources: 2017 [ ] DeepFix: Fixing ...

  4. [io benchmark]常用磁盘基准/压力测试工具

    Unix Disk I/O Benchmarks fio - NEW! fio is an I/O tool meant to be used both for benchmark and stres ...

  5. UVA - 10895 Matrix Transpose

    UVA - 10895 Matrix Transpose Time Limit:3000MS   Memory Limit:Unknown   64bit IO Format:%lld & % ...

  6. Awesome Go

    A curated list of awesome Go frameworks, libraries and software. Inspired by awesome-python. Contrib ...

  7. Go 语言相关的优秀框架,库及软件列表

    If you see a package or project here that is no longer maintained or is not a good fit, please submi ...

  8. Awesome Go (

    A curated list of awesome Go frameworks, libraries and software. Inspired by awesome-python. Contrib ...

  9. Awesome Go精选的Go框架,库和软件的精选清单.A curated list of awesome Go frameworks, libraries and software

    Awesome Go      financial support to Awesome Go A curated list of awesome Go frameworks, libraries a ...


  1. xcode10设置自定义代码快 - Xcode10新功能新内容

    1. 2. 详情: Xcode10新功能新内容

  2. 7-找了一上午的BUG

    #include <iostream>#include <cstring>#include <algorithm>#define MAX 1<<28;u ...

  3. dede DedeTag Engine Create File False

    1.在织梦后台更新文档操作时出现DedeTag Engine Create File False   解决方案: 在织梦目录include/dedetag.class.php下搜索DedeTag En ...

  4. springmvc使用list集合实现商品列表的批量修改

    1将表单的数据绑定到List 1.1 需求 实现商品数据的批量修改. 1.2 需求分析 要想实现商品数据的批量修改,需要在商品列表中可以对商品信息进行修改,饼干且可以批量提交修改后的商品数据. 1.3 ...

  5. [freeCodeCamp] solution to HTTP JSON API SERVER passed!

    var http = require('http') var url = require('url') function parsetime (time) { return { hour: time. ...

  6. Autotest Weekly Report

    Autotest Weekly Report Reported by: 12/16/2013 What I Did Last Week Debug autotest scripts of ‘smart ...

  7. Memocache ...

  8. 【JDBC&Dbutils】JDBC&JDBC连接池&DBUtils使用方法(重要)

    -----------------------JDBC---------- 0.      db.properties文件 driver=com.mysql.jdbc.Driver url=jdbc: ...

  9. 2018.10.23 bzoj1297: [SCOI2009]迷路(矩阵快速幂优化dp)

    传送门 矩阵快速幂优化dp简单题. 考虑状态转移方程: f[time][u]=∑f[time−1][v]f[time][u]=\sum f[time-1][v]f[time][u]=∑f[time−1 ...

  10. 2018.09.18 atcoder Best Representation(kmp)

    传送门 思路简单不知为何调试了很久. 显然要么分成n个(所有字符相同),要么分成1个(原字符串无循环节),要么分成两个(有长度至少为2的循环节). 一开始以为可以直接hash搞定. 后来wa了几次之后 ...