Rarely executed and almost empty if statement drastically reduces performance in C++
Editor's clarification: When this was originally posted, there were two issues:
- Test performance drops by a factor of three if seemingly inconsequential statement added
- Time taken to complete the test appears to vary randomly
The second issue has been solved: the randomness only occurs when running under the debugger.
The remainder of this question should be understood as being about the first bullet point above, and in the context of running in VC++ 2010 Express's Release Mode with optimizations "Maximize Speed" and "favor fast code".
There are still some Comments in the comment section talking about the second point but they can now be disregarded.
I have a simulation where if I add a simple if statement into the while loop that runs the actual simulation, the performance drops about a factor of three (and I run a lot of calculations in the while loop, n-body gravity for the solar system besides other
things) even though the if statement is almost never executed:
if (time - cb_last_orbital_update > 5000000)
cb_last_orbital_update = time;
with time
and cb_last_orbital_update
being both of type
and defined in the beginning of the main function, where this if statement is too. Usually there are computations I want to run there too, but it makes no difference if I delete them. The if statement as it is above has the same effect on
the performance.
The variable time
is the simulation time, it increases in 0.001 steps in the beginning so it takes a really long time until the if statement is executed for the first time (I also included printing a message to see if it is being executed, but
it is not, or at least only when it's supposed to). Regardless, the performance drops by a factor of 3 even in the first minutes of the simulation when it hasn't been executed once yet. If I comment out the line
cb_last_orbital_update = time;
then it runs faster again, so it's not the check for
time - cb_last_orbital_update > 5000000
either, it's definitely the simple act of writing current simulation time into this variable.
Also, if I write the current time into another variable instead of cb_last_orbital_update
, the performance does not drop. So this might be an issue with assigning a new value to a variable that is used to check if the "if" should be executed?
These are all shots in the dark though.
Disclaimer: I am pretty new to programming, and sorry for all that text.
I am using Visual C++ 2010 Express, deactivating the stdafx.h
precompiled header function didn't make a difference either.
EDIT: Basic structure of the program. Note that nowhere besides at the end of the while loop (time += time_interval;
) is
changed. Also, cb_last_orbital_update
has only 3 occurrences: Declaration / initialization, plus the two times in the if statement that is causing the problem.
int main(void)
double time = 0;
double time_interval = 0.001;
double cb_last_orbital_update = 0;
F_Rocket_Preset(time, time_interval, ...);
Rocket[active].Stage[Rocket[active].r_stage].F_Update_Stage_Performance(time, time_interval, ...);
Rocket[active].F_Calculate_Gravitational_Forces(cb_mu, cb_pos_d, time);
Rocket[active].F_Update_Rotation(time, time_interval, ...);
Rocket[active].F_Update_Position_Velocity(time_interval, time, ...);
F_Update_Celestial_Bodies(time, time_interval, ...);
if (time - cb_last_orbital_update > 5000000.0)
cb_last_orbital_update = time;
Rocket[active].F_Check_Apoapsis(time, time_interval);
Rocket[active].F_Status_Check(time, ...);
Rocket[active].F_Update_Mass (time_interval, time);
Rocket[active].F_Staging_Check (time, time_interval);
time += time_interval;
if (time > 3.1536E8)
std::cout << "\n\nBreak main loop! Sim Time: " << time << std::endl;
Here is the difference in the assembly code. On the left is the fast code with the line
cb_last_orbital_update = time;
outcommented, on the right the slow code with the line.
So, i found a workaround that seems to work just fine so far:
int cb_orbit_update_counter = 1; // before while loop
if(time - cb_orbit_update_counter * 5E6 > 0)
While that workaround does work, it only works in combination with using
. I just removed those from the function declarations again to see if that changes anything, and it does.
EDIT 6: Sorry this is getting confusing. I tracked down the culprit for the lower performance when removing
to this function, that is being executed inside the
__declspec(noinline) std::string F_Get_Body_Name(int r_body)
switch (r_body)
case 0:
return ("the Sun");
case 1:
return ("Mercury");
case 2:
return ("Venus");
case 3:
return ("Earth");
case 4:
return ("Mars");
case 5:
return ("Jupiter");
case 6:
return ("Saturn");
case 7:
return ("Uranus");
case 8:
return ("Neptune");
case 9:
return ("Pluto");
case 10:
return ("Ceres");
case 11:
return ("the Moon");
return ("unnamed body");
The if
also now does more than just increase the counter:
if(time - cb_orbit_update_counter * 1E7 > 0)
std::cout << F_Get_Body_Name(3) << " SMA: " << cb_sma[3] << "\tPos Earth: " << cb_pos_d[3][0] << " / " << cb_pos_d[3][1] << " / " << cb_pos_d[3][2] <<
"\tAlt: " << sqrt(pow(cb_pos_d[3][0] - cb_pos_d[0][0],2) + pow(cb_pos_d[3][1] - cb_pos_d[0][1],2) + pow(cb_pos_d[3][2] - cb_pos_d[0][2],2)) << std::endl;
std::cout << "Time: " << time << "\tcb_o_h[3]: " << cb_o_h[3] << std::endl;
I remove __declspec(noinline)
from the function F_Get_Body_Name
alone, the code gets slower. Similarly, if i remove the execution of this function or add
again, the code runs faster. All other functions still have
EDIT 7:So i changed the switch function to
const std::string cb_names[] = {"the Sun","Mercury","Venus","Earth","Mars","Jupiter","Saturn","Uranus","Neptune","Pluto","Ceres","the Moon","unnamed body"}; // global definition
const int cb_number = 12; // global definition
std::string F_Get_Body_Name(int r_body)
if (r_body >= 0 && r_body < cb_number)
return (cb_names[r_body]);
return (cb_names[cb_number]);
and also made another part of the code slimmer. The program now runs fast without any
. As ElderBug suggested, an issue with the CPU instruction cache then / the code getting too big?
I'd put my money on Intel's branch predictor. http://en.wikipedia.org/wiki/Branch_predictor
The processor assumes (time - cb_last_orbital_update > 5000000) to be false most of the time and loads up the execution pipeline accordingly.
Once the condition (time - cb_last_orbital_update > 5000000) comes true. The misprediction delay is hitting you. You may loose 10 to 20 cycles.
if (time - cb_last_orbital_update > 5000000)
cb_last_orbital_update = time;
Rarely executed and almost empty if statement drastically reduces performance in C++的更多相关文章
- Following a Select Statement Through Postgres Internals
This is the third of a series of posts based on a presentation I did at the Barcelona Ruby Conferenc ...
- 对PostgreSQL的prepared statement的深入理解
看官方文档: http://www.postgresql.org/docs/current/static/sql-prepare.html PREPARE creates a prepared sta ...
- verilog behavioral modeling--branch statement
conditional statement case statement 1. conditional statement if(expression) statement_o ...
- MySQL 5.6 Reference Manual-14.6 InnoDB Table Management
14.6 InnoDB Table Management 14.6.1 Creating InnoDB Tables 14.6.2 Moving or Copying InnoDB Tables to ...
- Introduction to ASP.NET Web Programming Using the Razor Syntax (C#)
1, http://www.asp.net/web-pages/overview/getting-started/introducing-razor-syntax-c 2, Introduction ...
- CoreCLR源码探索(八) JIT的工作原理(详解篇)
在上一篇我们对CoreCLR中的JIT有了一个基础的了解, 这一篇我们将更详细分析JIT的实现. JIT的实现代码主要在https://github.com/dotnet/coreclr/tree/m ...
- Practical Go: Real world advice for writing maintainable Go programs
转自:https://dave.cheney.net/practical-go/presentations/qcon-china.html?from=timeline 1. Guiding pri ...
- SAP NOTE 1999997 - FAQ: SAP HANA Memory
Symptom You have questions related to the SAP HANA memory. You experience a high memory utilization ...
- 【ruby】ruby基础知识
Install Ruby(安装) For windows you can download Ruby from http://rubyforge.org/frs/?group_id=167 for L ...
- VP-UML系统建模工具研究
一.基本信息 标题:VP-UML系统建模工具研究 时间:2014 出版源:软件工程师 领域分类:面向对象:CASE:UML:系统建模: 二.研究背景 问题定义:VP-UML系统建模的主要特点 难点:运 ...
- Codeforces791 C. Bear and Different Names
C. Bear and Different Names time limit per test 1 second memory limit per test 256 megabytes input s ...
- 通过MFC设计一个简单的计价程序
1.实验目的 掌握使用MFC应用程序向导创建应用程序的方法. 掌握新建对话框资源的方法. 掌握生成对话框的方法. 2.实验内容 用应用程序创建一个默认的对话框应用程序,在对话框中能进入下一个对话框,在 ...
- 调用opencv相关函数,从视频流中提取出图片序列&&&&jpg图片序列,转化成avi格式视频
/************************ @HJ 2017/3/30 参考http://blog.sina.com.cn/s/blog_4b0020f301010qcz.html修改的代码 ...
- HAProxy出现"远程主机强迫关闭了一个现有的连接 " 的错误及解决
使用haproxy作为sql server 的负载均衡器. 使用了文档中的示例配置项: timeout client 50s timeout server 50s 采用这个配置项,有时会 ...
- MySQL 基础--时间戳类型
时间戳数据存储 .TimeStamp的取值范围为'1970-01-01 00:00:01' UTC 至'2038-01-19 03:14:07' UTC: .在存储时间戳数据时先将数据转换为UTC时区 ...
- poj2699
神题目=神题解+神读入 题意:n个人比赛, 两两比,共n*(n-1), 赢得1分, n<=10(这给了我们枚举的暗示),如果一个人打败了所有比自己分数高的人, 或者他本身就是分数最高的, 那么他 ...
- 人生苦短:Python里的17个“超赞操作
人生苦短,我选Python”.那么,你真的掌握了Python吗? 1. 交换变量 有时候,当我们要交换两个变量的值时,一种常规的方法是创建一个临时变量,然后用它来进行交换.比如: # 输入 a = ...
- 使用node.js + json-server + mock.js 搭建本地开发mock数据服务
在开发过程中,前后端不论是否分离,接口多半是滞后于页面开发的.所以建立一个REST风格的API接口,给前端页面提供虚拟的数据,是非常有必要的.对比过多种mock工具后,我最终选择了使用 json se ...
- U-boot的编译方式及目录结构解析
U-boot的整体结构和linux基本类似,编译方式一般也是非常类似的,一般的编译命令: make CROSS_COMPILE=arm-linux-gnueabihf- XXX(目标名) 清除命令: ...