转自:http://ariasprado.name/2011/11/30/profiling-application-llc-cache-misses-under-linux-using-perf-events.html

In this post we will see how to do some profiling under Ubuntu Linux using Perf Events, present in the kernel since version 2.6.31 [1, 2]. In particular, we will estimate the rate of Last Level Cache (LLC) misses that a Java application has.

There are GIS applications that are computing power hungry; among them applications processing LiDAR data are an example, because the volume of the input data is usually huge. The efficient usage of the processor caches can boost execution time. Given the high penalty processor cache misses have, identifying application areas causing too much cache misses is very important.

1. Installation of Perf Events

Fortunately, Ubuntu Linux offers Perf Events (PE) in the form of binary packages. By using the command apt-get, installation is straighforward:

$ sudo apt-get install linux-tools-common linux-tools-2.6.38-13

Two notes about installation. First, before attempting installation check that the kernel you are using is recent enough: Perf Events [note 1] is available since Linux version 2.6.31. Second, install a version of the package linux-tools matching your kernel version.

2. The Java test application

Below is shown a simple Java application able to cause many LLC cache misses.

The constructor method creates and populates a square matrix of random, double numbers.

The method calculateSum() calculates the summatory of all numbers the matrix stores; this method is called fifty times. Since sum is conmutative, traversing the matrix by rows or by columns will yield the same result; the boolean parameter traverseByRows sets the traversing mode.

 public class LLCMissesTest {
public static final int DEFAULT_MATRIX_SIZE = 7500; protected double[][] matrix; public LLCMissesTest(int n) {
matrix = new double[n][n];
for (int i = 0; i < n; i = i + 1) {
for (int j = 0; j < n; j = j + 1) {
matrix[i][j] = Math.random();
}
}
} public double calculateSum(boolean traverseByRows) {
double sum = (double) 0; int n = matrix.length;
for (int i = 0; i < n; i = i + 1) {
for (int j = 0; j < n; j = j + 1) {
if (traverseByRows == true) {
sum = sum + matrix[i][j];
} else {
sum = sum + matrix[j][i];
}
}
} return sum;
} public static void main(String[] args) {
final int NUM_ITERATIONS = 50; LLCMissesTest lmt = new LLCMissesTest(DEFAULT_MATRIX_SIZE);
boolean traverseByRows = true;
for (int i = 0; i < NUM_ITERATIONS; i = i + 1) {
System.out.printf("i = %d, traverseByRows = %b: total = %f\n", i, traverseByRows, lmt.calculateSum(traverseByRows));
}
}
}

What can we expect from this class? When executing the method calculateSum() the number of LLC memory load events will be orders of magnitude higher when traversing the matrix by columns (that is, the parameter traverseByRows is set to false), and also a higher number of LLC load misses. This is because in Java matrices are stored by rows (Row-major order) and without any guarantee that two consecutive rows are actually contiguous in memory [note 2].

3. Counting LLC loads and LLC load misses miss events

Once Perf Events is installed we can measure, among others, the number of LLC-loadand LLC-load-misses cache misses events. The list of list of the available pre-defined events can be get by executing

$ perf list

According to the man page the returned list items are actually "the symbolic event types which can be selected in the various perf commands with the -e option" [note 3].

We have counted the number of LLC-loads and LLC-load-misses events by using perf's command stat:

$ perf stat -e LLC-loads,LLC-load-misses java LLCMissesTest

This measurement has been done twice: the first time, the variable traverseByRows (line #36) was set to true, the second one it was set to false. The results are shown in the table below:

matrix
size
traverse
by rows
LLC-loads
events
LLC-load-misses
events
time
(seconds)
load-misses / loads
ratio
7,500 true 367,944,413 15,522,099 8.63 4.22%
7,500 false 10,467,824,326 1,288,872,561 84.13 12.31%

It can be seen that when traversing the matrix by columns, the number of LLC-loads andLLC-load-misses events increases by orders of magnitude, and hence the execution time.

Hardware main features were:

  • processor: 2 x Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz
  • cache size: 3,072 kilobytes
  • bogomips: 5,852.47
  • RAM size: 3,597,972 kilobytes

Operating system was Ubuntu Linux 11.04, kernel 2.6.38-13-generic-pae. The Java virtual machine was the OpenJDK Runtime Environment (IcedTea6 1.10.4), Java version 1.6.0_22.

4. Caveats

The tests made are very simple: what we have actually measured in the previous section is the number of LLC-loads and LLC-load-misses events of the whole program, not just the method calculateSum(); to minimize the contribution of other parts of the program, the method calculateSum() is called 50 times.

Another issue comes from the fact that the LLC is a shared resource. Hence, if the test is run in parallel with other applications that intensively use consume computer memory, the gotten results could be inaccurate.

5. Useful links

references

[1] "2.6.31 is out": http://goo.gl/UCfWn

[2] "Perfcounters added to the mainline": http://lwn.net/Articles/339361/

notes

[note 1] The first version was named Performance Counters. In version 2.6.32 it was renamed to Perf Events.

[note 2] "We can expect elements of an array of primitive elements to be stored contiguously, but we cannot expect the objects of an array of objects to be stored contiguously. For a rectangular array of primitive elements, the elements of a row will be stored contiguously, but the rows may be scattered. A basic observation is that accessing the consecutive elements in a row will be faster than accessing consecutive elements in a column." (http://goo.gl/O8HPf)

Geir Gundersen, Trond Steihaug; 2004; "Data structures in Java for matrix computations"; Concurrency and Computation: Practice and Experience; vol. 16, issue 8; pp. 799-815

[note 3] "These events have been specifically implemented by architecture. Preliminary investigations suggest that the events appear correct but we also suggest that the events are compared against the corresponding raw counters and also against oprofile results until this tool is thoroughly investigated (this section will be updated as confirmation is made)."

Bill Buros, "Using perf on POWER7 systems" (http://goo.gl/f4vS3)

【转】Profiling application LLC cache misses under Linux using Perf Events的更多相关文章

  1. Linux 下ThinkPHP项目出现_STORAGE_WRITE_ERROR_:./Application/Runtime/Cache/Admin/0dfec61edd66f450033aa87c28a760f4.php

    在Linux中部署了ThinkPHP项目,访问时却出现了_STORAGE_WRITE_ERROR_:./Application/Runtime/Cache/Admin/0dfec61edd66f450 ...

  2. application与cache

    每个项目都有一些全局,常用的信息,而这些信息如果在每次使用时都载入,那必将耗费很大的资源,特别是对访问压力大的系统.因此,这个情况中,把这些全局信息放到缓存中是很必要的,放在缓存中可以使得数据能够很快 ...

  3. Session、Application、Cache

    [Asp.Net]状态管理(Session.Application.Cache) 上篇博文介绍了在客户端状态管理的两种方式:http://www.cnblogs.com/wolf-sun/p/3329 ...

  4. Asp.net 中ViewState,cookie,session,application,cache的比较

    Asp.net 中的状态管理维护包含ViewState,cookie,session,application,cache五种方式,以下是它们的一些比较: 1.存在于客户端还是服务端 客户端: view ...

  5. Thinkphp在Lnmp环境下部署项目先后报错问题解决:_STORAGE_WRITE_ERROR_:./Application/Runtime/Cache/Home/...Access denied.

    首先报错:_STORAGE_WRITE_ERROR_:./Application/Runtime/Cache/Home/769e70f2e46f34ceb60619bbda5e4691.php 解决此 ...

  6. _STORAGE_WRITE_ERROR_:./Application/Runtime/Cache/Home/f8995a0e1afcdadc637612fae5a3b585.php

    将one think部署到服务器上出现下面的问题 _STORAGE_WRITE_ERROR_:./Application/Runtime/Cache/Home/f8995a0e1afcdadc6376 ...

  7. Linux Kernel ‘perf’ Utility 本地提权漏洞

    漏洞名称: Linux Kernel ‘perf’ Utility 本地提权漏洞 CNNVD编号: CNNVD-201309-050 发布时间: 2013-09-09 更新时间: 2013-09-09 ...

  8. [Asp.Net]状态管理(Session、Application、Cache)

    上篇博文介绍了在客户端状态管理的两种方式:http://www.cnblogs.com/wolf-sun/p/3329773.html.除了在客户端上保存状态外,还可以在服务器上保存状态.使用客户端的 ...

  9. [Asp.Net]状态管理(Session、Application、Cache、Cookie 、Viewstate、隐藏域 、查询字符串)

    Session:  1. 客户在服务器上第一次打开Asp.Net页面时,会话就开始了.当客户在20分钟之内没有访问服务器,会话结束,销毁session.(当然也可以在Web.config中设置缓存时间 ...

随机推荐

  1. storyboard和xib的各种问题

    1.prepareFoSegue注意问题使用该方法设置的值, 必须要 viewWillApear之后用 2.storayboard的使用autolayout, constant = -16, 刚好在f ...

  2. SQL注入自学[第二学:注入环境的简单突破]

    /* 原文出处:珍惜少年时 留给原创一个ZBD机会. 加号即空格 */ 00x1 判断是否含有注入 http://127.0.0.1/1.php?id=3 and 1=1-- 返回正确的页面. htt ...

  3. Linux 命令行生成随机密码的十种方法

    Linux操作系统的一大优点是对于同样一件事情,你可以使用高达数百种方法来实现它.例如,你可以通过数十种方法来生成随机密码.本文将介绍生成随机密码的十种方法.这些方法均收集于Command-Line ...

  4. Expected MultipartHttpServletRequest: is a MultipartResolver configured?

    2015-05-05 19:09:47.510::WARN: /purchase/long-term-contract/uploading.htmjava.lang.IllegalArgumentEx ...

  5. powerdesigner奇淫技

    在日常开发中数据库的设计常常需要建立模型,而powerdesigner是个不错的选择.但很多时候用powerdesigner生成模型后再去创建表结构,会觉得烦和别扭.那么能不能数据库表建好后再生成模型 ...

  6. iOS 查看系统字体效果的网页

    常常需要查看字体的样式,这里推荐一个网页http://iosfonts.com,方便查阅.

  7. iOS 和 Android 中的后台运行问题

    后台机制的不同,算是iOS 和 Android的一大区别了,最近发布的iOS7又对后台处理做了一定的更改,找时间总结一下编码上的区别,先做个记录. 先看看iOS的把,首先需要仔细阅读一下Apple的官 ...

  8. 【USACO】第一章总结

    做了大半个月,终于把第一章做完了 有的题遇到了不小的坎儿,看着网上一群高中生都做得那么好,心理还是有些小郁闷的.不禁感慨我过去的四年真是虚度啊.总结一下第一章学习到的知识吧. ①闰年判断 int is ...

  9. Adaboost算法结合Haar-like特征

    Adaboost算法结合Haar-like特征 一.Haar-like特征 目前通常使用的Haar-like特征主要包括Paul Viola和Michal Jones在人脸检测中使用的由Papageo ...

  10. Linux系统查看系统是32位还是64位方法总结(转)

    总结.归纳查看Linux系统是32位还是64位的一些方法,很多内容来自网上网友的博客.本篇只是整理.梳理这方面的知识,方便自己忘记的时候随时查看. 方法1:getconf LONG_BIT 查看 如下 ...