Linux thread process and kernel mode and user mode page table

Linux 中线程和进程切换的开销：

Linux 操作系统层面的进程和线程的实现都是task_struct描述符. task_struct 包含成员变量：内核态stack. 这些都存在3-4G虚拟地址空间的内核态空间中。内核栈用于保存各个寄存器值：CS,DS,SS等. os层面的线程进程切换，都是在kernel mode下操作的。每个process都有自己unique的内核栈（因为每个process对应一个task_struct，kernel stack is member of the struct).

process context switch: 从user mode 到kernel mode, 内核stack用于保存user mode的寄存器值，用于下次返回用户态时候，能够通过寄存器找到指令和内存地址。user mode 通过中断进去kernel mode，通过int $80 syscall mechanism，找到中断处理程序：

包括：

The int instruction is a complex multi step instruction. Here is an explanation of what it does:

1.) Extracts descriptor from IDT (IDT address stored in special register) and checks that CPL <= DPL. CPL is a current privilege level, which could be read from CS register. DPL is stored in the IDT descriptor. As a consequence of this - you can't generate some exceptions (f.e. page fault) from user space directly by int instruction. If you will try to do this, you will get general protection exception

2.) The processor switches to the stack defined in TSS. TSS was initialized earlier, and already contains values of ESP and SS, which holds the kernel stack address. So now ESP points to kernel stack.

3.) The processor pushes to the newly switched kernel stack user space registers: ss, esp, eflags, cs, eip. We need to return back after syscall is served, right?

4.) Next processor set CS and EIP from IDT descriptor. This address defines exception vector entry point.

5.) Here we are in the syscall exception vector in kernel.

以上是user to kernel，那么如果是线程进程切换呢？sched_yield system call会接着把选择一个线程进行切换，把new 线程的内核栈pop到寄存器中，正式进入新线程的内核态，然后返回user mode。完成切换

区别呢？proces 切换包括虚拟地址空间的切换，切换的实质就是cr3切换（内存空间切换，在switch_mm函数中）+ 寄存器切换（包括EIP，ESP等，均在switch_to函数中）. 任何线程内核态的页表完全一样，是共享的。只有用户态页表不同。这就是主要区别，就是页表，由此到来的TLB 失效，导致的性能开销。所谓TLB，是因为TLB存在最近使用的页表项，页表本身是物理内存。TLB减少了页表项的寻址.

用户层面的线程栈大小为什么是8MB限制。因为很多语言都支持多线程。例如C++ pthread，所谓线程栈都在进程地址空间的stack栈区。不同线程栈不应该相互重叠，否则会写坏各自的栈区crash。所以如果不事先规定stack的地址和大小。而是无限增长，那么肯定会重叠。且分配过大会导致可create的线程数变小。用户态线程切换的本质就是寄存器的切换，非常轻量级别

CPU的特权级别：ring 0- ring 3. cs段选择子本质就是cs寄存器的值，包括index 和 CPL，index用于找到段描述符表的一个段描述符entry的偏移地址。段描述符包含段基址和DPL，也就是段地址：线性地址。同时表明这个线性地址的特权级别。注意分段机制下，cs和ds，ss段看成不同的段，现代os已经废除分段机制，intel只是为了兼容。内核态的cs，ss，ds段都会把DPL置成0，表明user mode 的指令不能操作它们。这就是保护模式。那么为什么需要RPL呢？

RPL – Requested Privilege Level

These are the last two bits of DS, ES, SS, FS, GS registers. RPL field is used to harden the CPL, when higher-privileged code is servicing lower-privileged processes requests.

Assume a higher-privileged device-driver that supports a mechanism where, it can copy data from disks directly into lower-privileged processes’ data-segments. Lower-privileged processes must pass their data-segment details (selector, address and size of data to copy) to the device-driver so that device-driver can copy data into appropriate location.

Since a device-driver is higher-privileged, a lower-privileged process can trick the driver to copy data into high-privileged data-segments, simply by passing wrong selector value. This kind of exploit is called, Privilege Escalation.

How RPL helps to solve Privilege Escalation problem?

Continuing the above example, whenever device-driver loads the destination segment, it modifies the destination segment’s RPL to match the requestor (lower-privileged) process. Since protection rules for data-segments check for both CPL <= DPL and RPL <= DPL conditions, higher-privileged process gets a protection-fault on RPL <= DPL check.

The point to note is, higher-privileged code, when it is providing services to lower-privileged processes should reduce its privilege temporarily to the requestors’ privilege-level.

cpu 的privilege 模式可以保护内存，如果user态范围了受保护的内存地址，会触发segment fault error.

至于二级页表的根本目的就是减少连续虚拟地址空间的需求，不然32位的process 会需要4MB的页表大小（单页4KB前提下）。因为物理页框的大小是4KB，那么虚拟线性地址空间如果找到物理地址呢？假如采用直接映射的话，一个页表项对应一个页框，4GB/4KB=1MB。需要1mb个页表项进行映射，那么每个页表项需要多少bytes呢？1MB有20bit，所以最少需要20bit，3bytes大小，实际取4bytes大小。所以不采用分页目录，每个进程页表4MB物理内存。 4KB的物理页框是2的12次方个的物理地址。说明如果是32位的话，后12位可以不考虑，直接寻址前20位。

https://blog.csdn.net/displayMessage/article/details/80905810

Linux thread process and kernel mode and user mode page table的更多相关文章

WSL(Windows Subsystem for Linux)--Pico Process Overview
[转载] Windows Subsystem for Linux -- Pico Process Overview Overview This post discusses pico processe ...
Android开发：Android虚拟机启动错误Can't find 'Linux version ' string in kernel image file
Android启动出错,虚拟机报错信息如下: Starting emulator for AVD 'test' emulator: ERROR: Can't find 'Linux version ' ...
yum安装提示错误Thread/process failed: Thread died in Berkeley DB library
问题描述: yum 安装更新提示 rpmdb: Thread/process failed: Thread died in Berkeley DB library 问题解决: 01.删除yum临时库文 ...
rpmdb: Thread/process 9180/139855524558592 failed: Thread died in Berkeley DB library
使用yum安装出现问题:rpmdb: Thread/process 9180/139855524558592 failed: Thread died in Berkeley DB library 解决 ...
rpmdb: Thread/process 10646/3086534416 failed: Thread died in Berkeley DB library
明明用rpm查看包存在,但删除的时候进程就停住了.后来出现以下错误:rpmdb: Thread/process 10646/3086534416 failed: Thread died in Berk ...
js in depth: event loop & micro-task, macro-task & stack, queue, heap & thread, process
js in depth: event loop & micro-task, macro-task & stack, queue, heap & thread, process ...
linux page table entry struct
Page Table Entry The access control information is held in the PTE and is CPU specific; figure bit f ...
Kernel Page Global Directory (PGD) of Page table of Process created in Linux Kernel
Kernel Page Global Directory (PGD) of User process created 在早期版本: 在fork一个进程的时候,必须建立进程自己的内核页目录项(内核页目录 ...
TCP Socket Establish；UDP Send Package Process In Kernel Sourcecode Learning
目录 . 引言 . TCP握手流程 . TCP connect() API原理 . TCP listen() API原理 . UDP交互过程 . UDP send() API原理 . UDP bind ...

随机推荐

oracle中LPAD和RPAD函数的使用方法(加个人总结)
今天看到两个没有见过的SQL中的函数,总结一下: 函数参数:lpad( string1, padded_length, [ pad_string ] ) 其中 string1:源字符串 padded_ ...
[Web] About image: MozJPEG
Image is quite heavy in web traffic. it is about 53% whole web traffic. It is important to make sure ...
LeetCode 742. Closest Leaf in a Binary Tree
原题链接在这里:https://leetcode.com/problems/closest-leaf-in-a-binary-tree/ 题目: Given a binary tree where e ...
(尚021)Vue_eslint编码规范检查
1.eslint 1.1说明 1)ESLint是一个代码规范检查工具 2)它定义了很多特定的规则,一旦你的代码违背了某一规则,eslint会做出非常有用的提示 3)官网:http://eslint.o ...
Python爬虫 | re正则表达式解析html页面
正则表达式(Regular Expression)是一种文本模式,包括普通字符(例如,a 到 z 之间的字母)和特殊字符(称为"元字符"). 正则表达式通常被用来匹配.检索.替换和 ...
mov offset和lea的区别
mov offset和lea的区别原文地址:https://www.cnblogs.com/fanzi2009/archive/2011/11/29/2267725.html 全局变量取地址用mo ...
Kafka 消费者到底是什么以及消费者位移主题到底是什么（Python 客户端 1.01 broker）
Kafka 中有这样一个概念消费者组,所有我们去订阅 topic 和 topic 交互的一些操作我们都是通过消费者组去交互的. 在 consumer 端设置了消费者的名字之后,该客户端可以对多个 to ...
mysql 获取字符串的长度
mysql> select * from test; +----+------------+-------+-----------+ | id | name | score | subject ...
Linux系统学习(二)一Linux基本操作
一.Linux的目录结构 1.1 Linux的目录结构图 1.2 目录内容 /:这就是根目录.对你的电脑来说,有且只有一个根目录.所有的东西,我是说所有的东西都是从这里开始.举个例子:当你在终端里输入 ...
Symfony之入门学习
最近因业务需要,主要针对Edusoho进行二次开发.但是对于Symfony,我并不熟悉,我所了解的是,它的那套与我在Java中常用的开发模式MVC,本质上并不多大差异,就是所使用的语言不一样而已.下面 ...

Linux thread process and kernel mode and user mode page table

Linux thread process and kernel mode and user mode page table的更多相关文章

随机推荐

热门专题