拷贝二进制(elf)文件

在拷贝二进制文件的时候，如果文件是一个可执行文件，并且有一个进程在运行这个可执行文件，那么拷贝的时候会出现"文本忙"(ETXTBSY)的错误提示，并且拷贝失败。这还算是好的情况，如果拷贝的是一个so文件，并且此时这个so正在被某个进程使用，那么此时拷贝可以成功，但是可能会导致这个进程触发crash。

之前总结过一次这种现象，隔了一段时间之后竟然有些淡忘了。然后在网上看到LWN的这篇文章，言简意赅，所以再整理一下。

文章摘要

可执行文件(executable)

linus的主要观点是这个ETXTBUSY只是内核一个“有教养”(courtesy)的特性，或者说从道德上做的、避免某些人作出超级愚蠢的事情(we'll help you avoid shooting yourself in the foot when we notice)。但是对于共享库文件的写入避免并没强制，可能觉得这不是内核的义务，另一方面不会有这么不可思议(incredibly stupid)。

The kernel ETXTBUSY thing is purely a courtesy feature, and as people have noticed it only really works for the main executable because of various reasons. It's not something user space should even rely on, it's more of a "ok, you're doing something incredibly stupid, and we'll help you avoid shooting yourself in the foot when we notice".

共享库(shared library)

文章也提到了共享库之前是通过执行mmap的时候添加MAP_DENYWRITE标志位来避免被修改，但是通过mmap的man手册可以看到，这种行为会造成拒绝服务攻击(denial-of-service attacks)，所以已经忽略该标志位。这也对应了文章中所说的该功能已经从内核中移除，所以内核会愉快的替换一个进程正在使用的so文件( When MAP_DENYWRITE went away, so did that protection; current Linux systems will happily allow a suitably privileged user to overwrite in-use, shared libraries)。

       MAP_DENYWRITE

              This flag is ignored.  (Long ago—Linux 2.0 and earlier—it

              signaled that attempts to write to the underlying file

              should fail with ETXTBSY.  But this was a source of

              denial-of-service attacks.)

如果修改的so是自己构建的(通常如此)，那么这个修改通常只是部分进程crash，但是如果修改了一个系统so文件，例如libc.so这个，那岂不是系统中大部分的进程都可能会触发异常。

如何实现

文章中说明了在每个inode的i_writecount 保存了写入次数，这个值做了特殊逻辑处理：如果这个值为负值，说明它正在被一个进程作为主文件执行；如果正值则表示该文件正在以可写的方式打开的次数(同一个文件可以被多次以可写方式打开，可以在同一个进程，也可以在不同进程中)。由于任意一个执行和任意一个写入打开都是互斥的，所以正负两个范围表示不同的意义是可行的。

下面代码可以看到，当获取写入权限的时候，如果数值为负值(有进程在执行)，则表示返回ETXTBSY；同样，在执行运行文件进程的时候，如果该文件正在以可写方式打开，此时也会执行失败。补充一点：一个文件可以同时被多次以可写方式打开；当然也可以同时运行多个进程实例。

///@file: linux-3.12.6\include\linux\fs.h

/*

 * get_write_access() gets write permission for a file.

 * put_write_access() releases this write permission.

 * This is used for regular files.

 * We cannot support write (and maybe mmap read-write shared) accesses and

 * MAP_DENYWRITE mmappings simultaneously. The i_writecount field of an inode

 * can have the following values:

 * 0: no writers, no VM_DENYWRITE mappings

 * < 0: (-i_writecount) vm_area_structs with VM_DENYWRITE set exist

 * > 0: (i_writecount) users are writing to the file.

 *

 * Normally we operate on that counter with atomic_{inc,dec} and it's safe

 * except for the cases where we don't hold i_writecount yet. Then we need to

 * use {get,deny}_write_access() - these functions check the sign and refuse

 * to do the change if sign is wrong.

 */

static inline int get_write_access(struct inode *inode)

{

	return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;

}

static inline int deny_write_access(struct file *file)

{

	struct inode *inode = file_inode(file);

	return atomic_dec_unless_positive(&inode->i_writecount) ? 0 : -ETXTBSY;

}

static inline void put_write_access(struct inode * inode)

{

	atomic_dec(&inode->i_writecount);

}

static inline void allow_write_access(struct file *file)

{

	if (file)

		atomic_inc(&file_inode(file)->i_writecount);

}

mmap后修改文件

当把一个文件通过mmap映射到进程地址空间之后，再修改文件的内容，此时进程是否可以看到修改之后的内容呢？关于这一点，在mmap的man手册中同样有说明，主要是通过mmap时的MAP_SHARED和MAP_PRIVATE标志位决定。

也就是说，如果通过MAP_SHARED进行的映射，那么此次mmap的修改对所有通过MAP_SHARED的修改都可见，并且如果修改了文件系统中的内容，这个修改也同样对所有mmap可见；如果是通过MAP_PRIVATE进行的mmap，那么这个可见是未知的(unspecified)。

       MAP_SHARED

              Share this mapping.  Updates to the mapping are visible to

              other processes mapping the same region, and (in the case

              of file-backed mappings) are carried through to the

              underlying file.  (To precisely control when updates are

              carried through to the underlying file requires the use of

              msync(2).)

       MAP_PRIVATE

              Create a private copy-on-write mapping.  Updates to the

              mapping are not visible to other processes mapping the

              same file, and are not carried through to the underlying

              file.  It is unspecified whether changes made to the file

              after the mmap() call are visible in the mapped region.

一些实现问题

可见性

这种shared的实现其实最为简单：系统中的每个文件在内核中只有一个inode，文件某个部分的内容在inode中有唯一的一个页面。在shared的模式下，所有的修改都发生在这个页面中，由于页面对所有mmap可见，所以从现象上看都是互相立即可见。

相反，private的实现类似于页面的COW：在首日访问的时候加载问价内容，当首次写入的时候，分配一个私有的页面，由于加载和写入的时机并不确定，所以可见性也不确定。

truncate/punchhole

在通过cp命令拷贝文件的时候，该工具会判断目标文件是否存在，如果存在在可写方式打开文件的时候会加上O_TRUNC标志位，从而在open系统调用中清空文件所有内容。

现在关键的问题是：通过文件系统操作(open/write)修改了文件的内容，那些通过mmap映射的内存是否/何时/如何感受到文件系统的修改？

从truncate的代码可以看到，这个修改是立即可见的。关键的数据结构在于每个inode::address_space中的i_mmap红黑树和i_mmap_nonlinear链表，也就是当执行mmap的时候，不仅每个进程通过vma知道了自己映射了哪些文件，而每个文件(inode)也需要有一个结构来记录有哪些vma映射了文件中的内容。只有知道了这个内容，当文件内容发生变化的时候，才可以通知并操作映射了文件内容的vma。

在truncate的代码中，如果一个文件的某一部分被清零之后，所有映射到该内容的mmap都会被解除映射，下次访问的时候将会触发一次按需加载。这个机制其实和每个页面的管理结构相同，每个page也需要有一个rmap来记录这个页面被哪些vma映射，只是页面记录的vma主要是在于页面被swap到磁盘时解除映射。

///@file: linux-3.12.6\fs\namei.c

/*

 * Handle the last step of open()

 */

static int do_last(struct nameidata *nd, struct path *path,

		   struct file *file, const struct open_flags *op,

		   int *opened, struct filename *name)

{

///....

opened:

	error = open_check_o_direct(file);

	if (error)

		goto exit_fput;

	error = ima_file_check(file, op->acc_mode);

	if (error)

		goto exit_fput;

	if (will_truncate) {

		error = handle_truncate(file);

		if (error)

			goto exit_fput;

	}

``

```c

int ext3_setattr(struct dentry *dentry, struct iattr *attr)

{

///...

	if ((attr->ia_valid & ATTR_SIZE) &&

	    attr->ia_size != i_size_read(inode)) {

		truncate_setsize(inode, attr->ia_size);

		ext3_truncate(inode);

	}

///...

}

/**

 * truncate_setsize - update inode and pagecache for a new file size

 * @inode: inode

 * @newsize: new file size

 *

 * truncate_setsize updates i_size and performs pagecache truncation (if

 * necessary) to @newsize. It will be typically be called from the filesystem's

 * setattr function when ATTR_SIZE is passed in.

 *

 * Must be called with inode_mutex held and before all filesystem specific

 * block truncation has been performed.

 */

void truncate_setsize(struct inode *inode, loff_t newsize)

{

	i_size_write(inode, newsize);

	truncate_pagecache(inode, newsize);

}

通过address_space中记录的所有mmap了该文件的vma，并从vma中解除映射关系。

/**

 * unmap_mapping_range - unmap the portion of all mmaps in the specified address_space corresponding to the specified page range in the underlying file.

 * @mapping: the address space containing mmaps to be unmapped.

 * @holebegin: byte in first page to unmap, relative to the start of

 * the underlying file.  This will be rounded down to a PAGE_SIZE

 * boundary.  Note that this is different from truncate_pagecache(), which

 * must keep the partial page.  In contrast, we must get rid of

 * partial pages.

 * @holelen: size of prospective hole in bytes.  This will be rounded

 * up to a PAGE_SIZE boundary.  A holelen of zero truncates to the

 * end of the file.

 * @even_cows: 1 when truncating a file, unmap even private COWed pages;

 * but 0 when invalidating pagecache, don't throw away private data.

 */

void unmap_mapping_range(struct address_space *mapping,

		loff_t const holebegin, loff_t const holelen, int even_cows)

{

	struct zap_details details;

	pgoff_t hba = holebegin >> PAGE_SHIFT;

	pgoff_t hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;

	/* Check for overflow. */

	if (sizeof(holelen) > sizeof(hlen)) {

		long long holeend =

			(holebegin + holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;

		if (holeend & ~(long long)ULONG_MAX)

			hlen = ULONG_MAX - hba + 1;

	}

	details.check_mapping = even_cows? NULL: mapping;

	details.nonlinear_vma = NULL;

	details.first_index = hba;

	details.last_index = hba + hlen - 1;

	if (details.last_index < details.first_index)

		details.last_index = ULONG_MAX;

	mutex_lock(&mapping->i_mmap_mutex);

	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))

		unmap_mapping_range_tree(&mapping->i_mmap, &details);

	if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))

		unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);

	mutex_unlock(&mapping->i_mmap_mutex);

}

测试代码

在代码上简单修改，可以验证通过SHARED映射的文件，通过文件系统修改之后对mmap立即可见。

tsecer@harry: cat truncate.after.mmap.cpp

/* For the size of the file. */

#include <sys/stat.h>

/* This contains the mmap calls. */

#include <sys/mman.h>

/* These are for error printing. */

#include <errno.h>

#include <string.h>

#include <stdarg.h>

/* This is for open. */

#include <fcntl.h>

#include <stdio.h>

/* For exit. */

#include <stdlib.h>

/* For the final part of the example. */

#include <ctype.h>

#include <unistd.h>

/* "check" checks "test" and prints an error and exits if it is

   true. */

static void

check (int test, const char * message, ...)

{

    if (test) {

        va_list args;

        va_start (args, message);

        vfprintf (stderr, message, args);

        va_end (args);

        fprintf (stderr, "\n");

        exit (EXIT_FAILURE);

    }

}

int main (int argc, const char *argv[])

{

    /* The file descriptor. */

    int fd;

    /* Information about the file. */

    struct stat s;

    int status;

    size_t size;

    /* The file name to open. */

    const char * file_name = "me.c";

    /* The memory-mapped thing itself. */

    const char * mapped;

    int i;

    /* Open the file for reading. */

    fd = open (argv[1] , O_RDONLY);

    check (fd < 0, "open %s failed: %s", file_name, strerror (errno));

    /* Get the size of the file. */

    status = fstat (fd, & s);

    check (status < 0, "stat %s failed: %s", file_name, strerror (errno));

    size = s.st_size;

    /* Memory-map the file. */

    mapped = (const char *)mmap (nullptr, size, PROT_READ, MAP_SHARED, fd, 0);

    check (mapped == MAP_FAILED, "mmap %s failed: %s",

           file_name, strerror (errno));

    while(true)

    {

        printf("mapped %c\n", mapped[0]);

        sleep(1);

    }

    return 0;

}

tsecer@harry: g++ truncate.after.mmap.cpp

tsecer@harry: ./a.out ./X &

[1] 8931

tsecer@harry: mapped X

ecmapped X

ho mapped X

mapped X

echomapped X

 mapped X

Ymapped X

 mapped X

>mapped X

Xmapped X

tsecer@harry: mapped Y

mapped Y

mapped Y

mapped Y

mapped Y

修改文件时mmap如何处理的更多相关文章

Linux中用stat命令查看文件时3个时间点解析
有些时候,我们需要在Linux中使用stat命令来查看文件的详细信息.另外联想下,ls -l命令显示的是什么时间,touch命令修改文件的时间戳,修改的又是什么时间?在这里我们一起来试验下. 首先,我 ...
pycharm 修改新建文件时的头部模板(默认为__author__='...')
pycharm 修改新建文件时的头部模板默认为__author__='...' [省略号是默认你的计算机名] 修改这个作者名的步骤: 依次点击:File->Settings->Ed ...
pycharm 修改新建文件时的头部模板
pycharm 修改新建文件时的头部模板默认为__author__='...' [省略号是默认你的计算机名] 修改这个作者名的步骤: 依次点击:File->Settings->Edito ...
修改linux的文件时，如何快速找到要修改的内容并修改
修改linux系统下的文件时,如果文件内容很多,不容易找到需要修改的内容,下面详细介绍linux系统下如何快速修改文件. 工具/原料 linux系统方法/步骤在linux系统下,找到需 ...
Spring Boot在开发时实现热部署（开发时修改文件保存后自动重启应用）（spring-boot-devtools）
热部署是什么大家都知道在项目开发过程中,常常会改动页面数据或者修改数据结构,为了显示改动效果,往往需要重启应用查看改变效果,其实就是重新编译生成了新的Class文件,这个文件里记录着和代码等对应的各 ...
下载文件时-修改文件名字 Redis在Windows中安装方法 SVN安装和使用(简单版) WinForm-SQL查询避免UI卡死 Asp.Net MVC Https设置
下载文件时-修改文件名字 1后台代码 /// <summary> /// 文件下载2 /// </summary> /// <param name="Fil ...
【Linux】解决用vi修改文件,保存文件时,提示“readonly option is set”
当在终端执行sudo命令时,系统提示“hadoop is not in the sudoers file”: 其实就是没有权限进行sudo,解决方法如下(这里假设用户名是cuser): 1.切换到超级 ...
修改linux的文件时，如何快速找到要修改的内容
♦ 在linux系统下,找到需要修改的文件.使用cd+目录的命令进行文件所在的目录,使用ls命令查看是否有该文件. ♦ 使用vim+文件名,打开该文件 ♦ 快速在文件中找到需要修改的地方.如我们需要修 ...
\Temporary ASP.NET Files\root\文件不断增长，如何处理？
很久没有写博了.最近半年除了忙活布置新家和过年期间走亲访友之外,都是在公司处理一些项目中的杂事:连家里买的很多书都停下来没看了,感觉这段时间在事业和学习上一直都是忙忙碌碌,却又碌碌无为. 吐槽完,说正 ...
C:\Windows\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\root\文件不断增长，如何处理？
很久没有写博了.最近半年除了忙活布置新家和过年期间走亲访友之外,都是在公司处理一些项目中的杂事:连家里买的很多书都停下来没看了,感觉这段时间在事业和学习上一直都是忙忙碌碌,却又碌碌无为. 吐槽完,说正 ...

随机推荐

CLIP改进工作串讲(上)学习笔记
看了跟李沐学AI系列朱毅老师讲的CLIP改进工作串讲,这里记录一下. 1.分割分割的任务其实跟分类很像,其实就是把图片上的分类变成像素级别上的分类,但是往往图片上能用的技术都能用到像素级别上来.所以 ...
0627.selenium请求库*2
昨天已经介绍了关于selenium的简单基本基本使用--通过两种方式自动打开和关闭谷歌浏览器今天,我们将介绍的模块将是在昨天打开谷歌浏览器的基础上,打开某一个电商平台,并搜索某一个产品,利用前面学的知 ...
vue 复制功能
实现: let message = ` ${name} 电话:${item.phone} 部门:${item.department} 职务: ${item.post} 邮箱:${item.email} ...
python读取Excel整列或整行数据
单元格拆分 def get_index(capital): """ 大写字母(Excel列头)转数字 :param capital: 'A' --> 0, 'AA' ...
软件测试肖sir__多线程、多进程、多协程
Python并发编程有三种方式: 1.多线程Thread(threading)(读音:思来d,丁).多进程Process(multiprocessing).多协程Coroutine(asyncio) ...
mysql修改密码遇到的问题
在docker上安装了 mysql 容器,mysql镜像是8.0+版本修改密码语句: 只针对本机生效 alter user "root"@'localhost' identifi ...
JRebel for IDEA插件激活
JRebel for IDEA 插件的License Server 程序,在IDEA中下载JREBEL的插件后,通过配置此服务进行插件激活. 使用方法:通过java -jar JrebelBrains ...
springcloud zuul网关整合swagger2，swagger被拦截问题
首先感谢一位博主的分享https://www.cnblogs.com/xiaohouzai/p/8886671.html 话不多说直接上图和代码首先我们要有一个springcloud分布式项目我就 ...
Wordpress主题twentytwelve修改首页文章摘要
方法:网站后台->外观->编辑->找到content.php文件路径:wp-content/themes/twentytwelve/ 找到这一句: <?php if ( is ...
div里元素横向排列居中对齐
<div> <img src="//s.weituibao.com/1582958061265/mlogo.png" alt=&quo ...

修改文件时mmap如何处理