使用现代C++如何避免bugs(上)
使用现代C++如何避免bugs(上)
How to avoid bugs using modern C++
C++的主要问题之一是拥有大量的构造,它们的行为是未定义的,或者程序员只是意想不到的。在各种项目中使用静态分析器时,我们经常会遇到这些问题。但是,众所周知,最好的方法是在编译阶段检测错误。让我们看看现代C++中的哪些技术不仅帮助编写简单明了的代码,而且使它更安全、更可靠。
什么是现代C++?
在C++ 11发布后,现代C++这个术语变得非常流行。这是什么意思?首先,现代C++是一组模式和习语,旨在消除好的旧的“C类”,很多C++程序员都习惯了,特别是如果他们开始在C++编程11看起来更简洁易懂,这是非常重要的。
当人们谈论现代C++时,人们通常会想到什么?并行性、编译时计算、RAII、lambdas、ranges、concepts、modules和标准库的其他同等重要的组件(例如,用于文件系统的API)。这些都是非常酷的现代化,我们期待着在下一套标准中看到它们。不过,我想提醒大家注意新标准允许编写更安全的代码的方式。当开发静态分析器时,我们看到了大量的不同的错误,有时我们会忍不住想:“但是在现代C++中,这是可以避免的”。因此,我建议我们检查PVS Studio在各种开源项目中发现的几个错误。另外,我们还将看看如何修复它们。 Automatic type inference
In C++, the keywords auto and decltype were added. Of course, you already know how they work.
在C++中,增加了关键字auto和decltype。当然,你已经知道它们是怎么工作的了。
|
auto
//C++98: std::map<int, int>::iterator it = m.find(42); |
It's very convenient to shorten long types, without losing the readability of the code. However, these keywords become quite expansive, together with templates: there is no need to specify the type of the returning value with auto and decltype.
在不损失代码可读性的情况下,缩短长类型非常方便。但是,这些关键字和模板扩展性很强:不需要使用auto和decltype指定返回值的类型。
But let's go back to our topic. Here is an example of a 64-bit error:
|
unsigned if
|
In a 64-bit application, the value of string::npos is greater than the maximum value of UINT_MAX, which can be represented by a variable of unsigned type. It could seem that this is a case where auto can save us from this kind of problem: the type of the n variable isn't important to us, the main thing is that it can accommodate all possible values of string::find. And indeed, if we rewrite this example with auto, the error is gone:
在64位应用程序中,string::npos的值大于UINT_MAX的最大值,该值可以由无符号类型的变量表示。这似乎是一个auto可以将我们从这类问题中拯救出来的例子:n变量的类型对我们来说并不重要,主要的是它可以容纳string::find的所有可能值。实际上,如果我们用auto重写这个示例,错误就消失了:
|
auto if
|
But not everything is as simple. Using auto is not a panacea, and there are many pitfalls related to its use. For example, you can write the code like this:
但并不是所有事情都那么简单。使用auto并不是万能的,它的使用也存在许多陷阱。例如,您可以编写如下代码:
|
auto
char |
Auto won't save us from the integer overflow and there will be less memory allocated for the buffer than 5GiB.
Auto also isn't of any great help when it comes to a very common error: an incorrectly written loop. Let's look at an example:
Auto不会将我们从整数溢出中拯救出来,而且为缓冲区分配的内存将少于5GiB。
当遇到一个非常常见的错误:一个写得不正确的循环时,Auto也没有任何帮助。让我们看一个例子:
|
for
|
For large size arrays, this loop becomes an infinity loop. It's no surprise that there are such errors in the code: they reveal themselves in very rare cases, for which there were no tests.
Can we rewrite this fragment with auto?
对于大型数组,此循环变为无限循环。代码中存在这样的错误并不奇怪:它们在非常罕见的情况下会暴露自己,因为没有测试。
我们能用auto重写这个片段吗?
|
for
|
No. Not only is the error is still here. It has become even worse.
With simple types auto behaves very badly. Yes, in the simplest cases (auto x = y) it works, but as soon as there are additional constructions, the behavior can become more unpredictable. What's worse, the error will be more difficult to notice, because the types of variables aren't that obvious at first glance. Fortunately it is not a problem for static analyzers: they don't get tired, and don't lose attention. But for us, as simple mortals it's better to specify the types explicitly. We can also get rid of the narrowing casting using other methods, but we'll speak about that later.
不,不仅错误还在这里。情况变得更糟了。
对于简单类型,auto的行为非常糟糕。是的,在最简单的情况下(auto x=y),它可以工作,但是一旦有了额外的构造,行为就会变得更加不可预测。更糟糕的是,这个错误更难被注意到,因为变量的类型乍一看就不那么明显。幸运的是,对于静态分析器来说这不是一个问题:它们不会感到疲倦,也不会失去注意力。但对我们来说,简单来说,最好显式地指定类型。我们也可以使用其他方法来消除缩小范围的情况,但稍后我们将讨论这个问题。
Dangerous count of
One of the "dangerous" types in C++ is an array. Often when passing it to the function, programmers forget that it is passed as a pointer, and try to calculate the number of elements with sizeof.
危险计数
C++中的一种“危险”类型是数组。通常在将它传递给函数时,程序员会忘记它是作为指针传递的,并尝试使用sizeof计算元素的数量。
|
#define RTL_NUMBER_OF_V1(A) (sizeof(A)/sizeof((A)[0]))
#define _ARRAYSIZE(A) RTL_NUMBER_OF_V1(A)
int
|
Note: This code is taken from the Source Engine SDK.
PVS-Studio warning: V511 The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (iNeighbors)' expression. Vrad_dll disp_vrad.cpp 60
Such confusion can arise because of specifying the size of an array in the argument: this number means nothing to the compiler, and is just a hint to the programmer.
The trouble is that this code gets compiled, and the programmer is unaware that something is not right. The obvious solution would be to use metaprogramming:
注意:此代码取自源引擎SDK。
VS Studio警告:V511 size of()运算符返回“sizeof(iNeighbors)”表达式中指针的大小,而不是数组的大小。虚拟现实动态链接库Vrad_dll disp_vrad.cpp 60个
这种混淆可能是由于在参数中指定数组的大小而引起的:这个数字对编译器来说没有任何意义,对程序员来说只是一个警示。
问题是这段代码被编译了,而程序员却不知道有些事情不对。显而易见的解决方案是使用元编程:
|
template
|
If we pass to this function, not an array, we get a compilation error. In C ++17 you can use std::size.
In C++11, the function std::extent was added, but it isn't suitable as countof, because it returns 0 for inappropriate types.
如果我们传递给这个函数,而不是一个数组,就会得到一个编译错误。在C++17中,可以使用std::size。
在C++ 11中,函数std::extent被添加,但是它不匹配作为countof,因为它返回不匹配类型的0。
std::extent<
decltype(iNeighbors)>();
//=> 0
You can make an error not only with countof, but with sizeof as well.
|
|
Note: This code is taken from Chromium.
PVS-Studio warnings:
- V511 The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (salt)' expression. browser visitedlink_master.cc 968
- V512 A call of the 'memcpy' function will lead to underflow of the buffer 'salt_'. browser visitedlink_master.cc 968
As you can see, the standard C++ arrays have a lot of problems. This is why you should use std::array: in the modern C++ its API is similar to std::vector and other containers, and it's harder to make an error when using it.
注:此代码取自Chromium。
PVS-Studio警告:
•V511 size of()运算符返回“sizeof(salt)”表达式中指针的大小,而不是数组的大小。浏览器visitedlink_master.cc 968。
•V512调用“memcpy”函数将导致缓冲区“salt_”下溢。浏览器visitedlink_master.cc 968。正如你所看到的,标准的C++阵列有很多问题。这就是为什么你应该使用STD::SART:在现代C++中,它的API类似于STD::vector和其他容器,使用它时出错更难。
|
void
|
How to make a mistake in a simple for
One more source of errors is a simple for loop. You may think, "Where can you make a mistake there? Is it something connected with the complex exit condition or saving on the lines of code?" No, programmers make error in the simplest loops. Let's take a look at the fragments from the projects:
|
const
|
Note: This code is taken from Haiku Operation System.
PVS-Studio warning: V706 Suspicious division: sizeof (kBaudrates) / sizeof (char *). Size of every element in 'kBaudrates' array does not equal to divisor. SerialWindow.cpp 162
We have examined such errors in detail in the previous chapter: the array size wasn't evaluated correctly again. We can easily fix it by using std::size:
|
const
|
But there is a better way. Let's take a look at one more fragment.
|
inline
|
Note: This code is taken from Shareaza.
PVS-Studio warning: V547 Expression 'nCharPos >= 0' is always true. Unsigned type value is always >= 0. BugTrap xmlreader.h 946
It's a typical error when writing a reverse loop: the programmer forgot that the iterator of an unsigned type and the check always return true. You might think, "How come? Only novices and students make such mistakes. We, professionals, don't." Unfortunately, this is not completely true. Of course, everyone understands that (unsigned >= 0)- true. Where do such errors come from? They often occur as a result of refactoring. Imagine this situation: the project migrates from the 32-bit platform to 64-bit. Previously, int/unsigned was used for indexing and a decision was made to replace them with size_t/ptrdiff_t. But in one fragment they accidentally used an unsigned type instead of a signed one.
What shall we do to avoid this situation in your code? Some people advise the use of signed types, as in C# or Qt. Perhaps, it could be a way out, but if we want to work with large amounts of data, then there is no way to avoid size_t.Is there any more secure way to iterate through array in C++? Of course there is. Let's start with the simplest one: non-member functions. There are standard functions to work with collections, arrays and initializer_list; their principle should be familiar to you.
|
char for
|
Great, now we do not need to remember the difference between a direct and reverse cycle. There is also no need to think about whether we use a simple array or an array - the loop will work in any case. Using iterators is a great way to avoid headaches, but even that is not always good enough. It is best to use the range-based for loop:
|
char for
|
Of course, there are some flaws in the range-based for: it doesn't allow flexible management of the loop, and if there is more complex work with indexes required, then for won't be of much help to us. But such situations should be examined separately. We have quite a simple situation: we have to move along the items in the reverse order. However, at this stage, there are already difficulties. There are no additional classes in the standard library for range-based for. Let's see how it could be implemented:
|
template struct
template
|
In C++14 you can simplify the code by removing the decltype. You can see how auto helps you write template functions - reversed_wrapper will work both with an array, and std::vector.
Now we can rewrite the fragment as follows:
|
char for
|
What's great about this code? Firstly, it is very easy to read. We immediately see that the array of the elements is in the reverse order. Secondly, it's harder to make an error. And thirdly, it works with any type. This is much better than what it was.
You can use boost::adaptors::reverse(arr) in boost.
But let's go back to the original example. There, the array is passed by a pair pointer-size. It is obvious that our idea with reversed will not work for it. What shall we do? Use classes like span/array_view. In C++17 we have string_view, and I suggest using that:
|
void
|
string_view does not own the string, in fact it's a wrapper around the const char* and the length. That's why in the code example, the string is passed by value, not by the reference. A key feature of the string_view is compatibility with strings in various string presentations: const char*, std::string and non-null terminated const char*.
As a result, the function takes the following form:
|
inline
|
Passing to the function, it's important to remember that the constructor string_view(const char*) is implicit, that's why we can write like this:
Foo(pChars);
Not this way:
Foo(wstring_view(pChars, nNumChars));
A string that the string_view points
to, does not need to be null- terminated, the very name string_view::data gives
us a hint about this, and it is necessary to keep that in mind when using it.
When passing its value to a function from cstdlib, which is
waiting for a C string, you can get undefined behavior. You can easily miss it,
if in most cases that you are testing, there is std::string or
null-terminated strings used.
Enum
Let's leave C++ for a second and think about good old C. How is
security there? After all, there are no problems with implicit constructor
calls and operators, or type conversion, and there are no problems with various
types of the strings. In practice, errors often occur in the simplest
constructions: the most complicated ones are thoroughly reviewed and debugged,
because they cause some doubts. At the same time programmers forget to check
simple constructions. Here is an example of a dangerous structure, which came
to us from C:
|
enum
enum
int
|
An example of the Linux kernel. PVS-Studio warning: V556 The values of different enum types are compared: switch(ENUM_TYPE_A) { case ENUM_TYPE_B: ... }. libiscsi.c 3501
Pay attention to the values in the switch-case: one of the named constants is taken from a different enumeration. In the original, of course, there is much more code and more possible values and the error isn't so obvious. The reason for that is lax typing of enum - they may be implicitly casting to int, and this leaves a lot of room for errors.
In C++11 you can, and should, use enum class: such a trick won't work there, and the error will show up at the compilation stage. As a result, the following code does not compile, which is exactly what we need:
|
enum
enum
int
|
The following fragment is not quite connected with the enum, but has similar symptoms:
|
void
|
Note: This code is taken from ReactOS.
Yes, the values of errno are declared as macros, which is bad practice in C++ (in C as well), but even if the programmer used enum, it wouldn't make life easier. The lost comparison will not reveal itself in case of enum (and especially in case of a macro). At the same time enum class would not allow this, as there will were no implicit casting to bool.
Initialization in the constructor
But back to the native C++ problems. One of them reveals when there is a need to initialize the object in the same way in several constructors. A simple situation: there is a class, two constructors, one of them calls another. It all looks pretty logical: the common code is put into a separate method - nobody likes to duplicate the code. What's the pitfall?
|
|
Note: This code is taken from LibreOffice.
PVS-Studio warning: V603 The object was created but it is not being used. If you wish to call constructor, 'this->Guess::Guess(....)' should be used. guess.cxx 56
The pitfall is in the syntax of the constructor call. Quite often it gets forgotten, and the programmer creates one more class instance, which then gets immediately destroyed. That is, the initialization of the original instance isn't happening. Of course, there are 1001 ways to fix this. For example, we can explicitly call the constructor via this, or put everything into a separate function:
|
|
By the way, an explicit repeated call of the constructor, for example, via this is a dangerous game, and we need to understand what's going on. The variant with the Init() is much better and clearer. For those who want to understand the details of these "pitfalls" better, I suggest looking at chapter 19, "How to properly call one constructor from another", from this book.
But it is best to use the delegation of the constructors here. So we can explicitly call one constructor from another in the following way:
|
|
Such constructors have several limitations. First: delegated constructors take full responsibility for the initialization of an object. That is, it won't be possible to initialize another class field with it in the initialization list:
|
|
And of course, we have to make sure that the delegation doesn't create a loop, as it will be impossible to exit it. Unfortunately, this code gets compiled:
|
|
使用现代C++如何避免bugs(上)的更多相关文章
- C Primer Plus(第五版)11
第 11 章 字符串和字符串函数 在本章中你将学习下列内容: · 函数: gets(), puts(), strcat(), strncat(), strcmp(), strncmp(), strcp ...
- MySQL5.7在JSON解析后丢失小数部分的Bug
在MySQL Bugs上提交了 https://bugs.mysql.com/bug.php?id=84935 . 已经在MySQL8.0.1中修复 重现步骤 -- Prepare the table ...
- github入门到上传本地项目【网上资源整合】
[在原文章的基础上,修改了描述的不够详细的地方,对内容进行了扩充,整合了网上的一些资料] [内容主要来自http://www.cnblogs.com/specter45/p/github.html#g ...
- github入门到上传本地项目
GitHub是基于git实现的代码托管.git是目前最好用的版本控制系统了,非常受欢迎,比之svn更好. GitHub可以免费使用,并且快速稳定.即使是付费帐户,每个月不超过10美刀的费用也非常便宜. ...
- ASP.NET MVC异步上传文件
自己做的一个小dome.贴出来分享一下: 前端: <form id="formfile" method="post" enctype="mult ...
- X86上搭建交叉工具链,来给龙芯笔记本编译本地工具链(未完待续)
故事的背景是,我买了一台龙芯2F的笔记本来装B. 为什么说是装B呢?因为不但操作系统是Linux,而且CPU还是龙芯的. 一般人有这么酷的装备吗?简直是装B大圣啊. 这里一定要申明一点,本人不是IT技 ...
- 【月末轻松篇】--- 那些奇葩的Bugs
不能说所有的bug都是纸老虎,但往往那种看似很奇葩的bug,导致的原因确实很简单,烦了你一段时间,找到真相又让你忍不住一笑.什么是奇葩的bug呢.我的定义是:代码逻辑都一样,但在A处是好的,到了B处就 ...
- 在Windows平台上安装Node.js及NPM模块管理
1. 下载Node.js官方Windows版程序:http://nodejs.org/#download 从0.6.1开始,Node.js在Windows平台上提供了两种安装方式,一是.MSI安 ...
- 学习OpenStack之(5):在Mac上部署Juno版本OpenStack 四节点环境
0. 前沿 经过一段时间的折腾,终于在自己的Mac上装好了Juno版本的四节点环境.这过程中,花了大量的时间,碰到了许多问题,学到不少知识,折腾过不少其实不需要折腾的东西,本文试着来对这过程做个总结. ...
随机推荐
- Ubuntu Linux DNS服务器 BIND9配置文件命令介绍
BIND9配置方法 转载▼ 配置语法 named.conf acl 定义访问控制列表 controls 定义rndc命令使用的控制通道,若省略,则只允许经过rndc.key认证的127.0.0 ...
- SpringBoot2.0之@Configuration注解
SpringBoot2.0之@Configuration注解 本文转载自:https://www.javaman.cn/sb2/springboot-configuration 前面我们介绍了Spri ...
- ThinkPHP5查询-select与find理解
出现问题 在tp5框架中判断select查询结果是否为空时,无论查询条件是否满足,判断查询结果都不为空 解析问题 select查询的是多条数据,若查询数据为空,则返回一个空的二维数组 array(ar ...
- Pycharm集成码云,图文手把手教学!
Pycharm集成码云 码云(http://gitee.com)是开源中国推出的代码托管平台,支持 Git 和 SVN,提供免费的私有仓库托管 可以通过码云保管你的代码,每次修改完代码提交,就是一个版 ...
- Django(5)django配置信息
前言 Django的配置文件settings.py用于配置整个网站的环境和功能,核心配置必须有项目路径.密钥配置.域名访问权限.App列表.中间件.资源文件.模板配置.数据库的连接方式 基本配置信息 ...
- XD to Flutter 2.0 现已发布!
Flutter 是 Google 的开源 UI 工具包.利用它,只需一套代码库,就能开发出适合移动设备.桌面设备.嵌入式设备以及 web 等多个平台的精美应用.过去几年,对于想要打造多平台应用的开发者 ...
- 关于MySQL参数,这些你要知道
前言: 在前面一些文章中,经常能看到介绍某某参数的作用,可能有些小伙伴仍搞不清楚 MySQL 参数是啥.本篇文章我们来聊聊 MySQL 参数,学习下如何管理维护 MySQL 参数. 1.MySQL参数 ...
- xxl-job源码阅读一(客户端)
1.源码入口 使用xxl-job的时候,需要引入一个jar,然后还需要往Spring容器注入XxlJobSpringExecutor <dependency> <groupId> ...
- [Java] 部署到Linux
阿里云 控制台->云服务器ECS->实例->创建实例 计费方式 地域 网络 安全组:默认安全组 公网IP地址:分配 实例 公网带宽:1M ECS服务器:公共镜像CentOS 存储 购 ...
- SimpleSelectionSort
简单选择排序 <script type="text/javascript"> var obj={ data:[0,3,1,5,7,4,8,9,5], length:8 ...