
Elapsed Time(执行耗时):

the total time your target ran, is calculated as follows:

Wall clock time at end of application – Wall clock time at start of application

应用程序的整个的运行时间,等于 ”程序结束时间 减 程序开始时间”。

CPU Time:

Active processor Self time spent in the function. For multiple threads, CPU time is summed up. By default, the Self time is provided in seconds. The blue bar is a visual indicator of the CPU time usage. The longer the bar, the higher the value.




In the Summary window, CPU time is the overall time that all processors spent working for the application. If there are multiple cores then the times are added. For example, if core 1 spends 4 seconds working for the application and core 2 spends 7 seconds then the CPU time will be 11 seconds. The CPU time can be greater than the Elapsed time. The upper bound for CPU time is Elapsed time * number of logical cores.







Instructions Retired:

Modern processors execute much more instructions that the program flow needs. This is

called "speculative execution". Then the instructions that were "proven" as indeed

needed by flow are "retired". You can think about "retired" instuctions as only

instructions needed by the program flow.




I guess "retired instructions" means those instructions that are acturally executed and completed by CPU. The CPU some kind of prediction about the instructions to be excuted and put them into some place like a "pool". But not all of these instructions will be excuted.

CPI Rate

Clockticks per Instructions Retired (CPI) event ratio, also known as Cycles per Instructions, is one of the basic performance metrics for the hardware event-based samplingcollection. This ratio is calculated as Clockticks / Instructions Retired.


When you want to determine where to focus your performance tuning effort, the CPI is the first metric to check. A good CPI rate indicates that the code is executing optimally.

As a general guide these numbers have been derived from experienced performance engineers:





A high value for this ratio indicates that over the current code region, instructions are taking a high number of processor clocks to execute. This could indicate a problem if most of the instructions are not predominately high latency instructions and/or coming from microcode ROM. In this case there may be opportunities to modify your code to improve the efficiency with which instructions are executed within the processor.




Synchronization Context Switches(同步上下文切换):

Number of times a thread was switched off a processor because of making an explicit

call to thread synchronization API. For example, in case of trying to wait on a

synchronization object already occupied by another thread, the number of synchronization

context switches will characterize the level of contention between threads.



Wait Count:

Number of times the corresponding system wait API was called. For a lock, it is the number of times the lock was contended and caused a wait.

系统wait API被调用的次数。


Wait Rate:

Average Wait time (in milliseconds) per synchronization context switch. Low metric

value may signal an increased contention between threads and inefficient use of

system API.



Estimated Call Count:

Statistical estimation of call counts based on hardware events.


Wait Time:

Duration of a thread inactivity due to contended synchronization.


Inactive Time:

Time during which a thread remained preempted from execution. Note that many threads can be inactive at any given point in time, so the sum of Wait and Inactive times of those threads can be much greater than the Total time of program execution.



Overhead Time:

Duration that starts with the release of a shared resource and ends with the receipt of that resource. Ideally, the duration of overhead time is very short because it reduces the time a thread has to wait to acquire a resource.



Spin Time(轮询时间):

Wait Time during which the CPU is busy. This often occurs when a synchronization API causes the CPU to poll while the software thread is waiting. Some Spin Time may be preferable to the alternative of increased thread context switches. Too much Spin Time, however, can reflect lost opportunity for productive work.





Idle Time:

Duration while a thread remained inactive (for any reason) and the system did not have any other task to execute (was idle). The Idle time is always less than any of the Wait and Inactive time.



