ContextSwitch 学习与使用


说明

  1. github上面有一个简单的测试系统调用以及上下文切换的工具.
  2. contextswitch.
  3. 下载之后直接make就可以进行简单的测试
  4. 需要注意的是 部分arm环境没有:
  5. -mno-avx
  6. 这个参数, 需要去掉一下.

官方文档以及说明

  1. Little micro-benchmarks to assess the performance overhead of context
  2. switching.
  3. timesyscall: Benchmarks the overhead of a system call.
  4. timectxsw: Benchmarks the overhead of context switching between 2 processes.
  5. timetctxsw: Benchmarks the overhead of context switching between 2 threads.
  6. timectxswws: Benchmarks the overhead of context switching between 2 processes
  7. using a working set of the size specified in argument.
  8. timetctxsw2: Benchmarks the overhead of context switching between 2 threads,
  9. by using a shed_yield() method.
  10. If you do taskset -a 1, all threads should be scheduled on the
  11. same processor, so you are really doing thread context switch.
  12. Then to be sure that you are really doing it, just do:
  13. strace -ff -tt -v taskset -a 1 ./timetctxsw2
  14. Now why sched_yield() is enough for testing ? Because, it place
  15. the current thread at the end of the ready queue. So the next
  16. ready thread will be scheduled.
  17. I also added sched_setscheduler(SCHED_FIFO) to get the best
  18. performances.
  19. From: https://github.com/tsuna/contextswitch

脚本说明

  1. runbench() {
  2. $* ./timesyscall
  3. $* ./timectxsw
  4. $* ./timetctxsw
  5. $* ./timetctxsw2
  6. }
  7. 每一组测试内的内容分别为:
  8. 1. 系统调用的时间.
  9. 2. 2个进程之间的上下文切换的时间.
  10. 3. 同一进程内的连个线程切换的时间.
  11. 4. shed_yield() method 方法的切换时间 (不太了解)
  12. 一共分为三组
  13. 第一组不进行设置
  14. 第二组绑定CPU但是在两个核心上
  15. 第三组绑定到同一个CPU核心上面.

测试结果说明

  1. 在我所有的测试环境内:
  2. 1. AMD 9T34 无可争议的排第一
  3. 2. 相同硬件不同操作系统的差异比较大, 如果比较必须使用相同的操作系统来进行.
  4. 3. 国产里面与SPECJVMSPECCPU的结果完全一样.飞腾<海光<鲲鹏<阿里倚天
  5. 阿里倚天无可争议的王者.
  6. 4. 十年前的CPU的确不如现在新的CPU. 必须更新换代,性能更好,速度更快.
  7. 5. CPU绑核非常有用途,需要进行优化.
  8. 6. 协程,轻量级线程是未来. 只有这样性能才会好.

结果图表-1


结果图表-2


E5-2620 2.0Ghz

  1. 2 physical CPUs, 6 cores/CPU, 2 hardware threads/core = 24 hw threads total
  2. -- No CPU affinity --
  3. 10000000 system calls in 11841646290ns (1184.2ns/syscall)
  4. 2000000 process context switches in 6039748545ns (3019.9ns/ctxsw)
  5. 2000000 thread context switches in 6745297188ns (3372.6ns/ctxsw)
  6. sched_setscheduler(): Operation not permitted
  7. 2000000 thread context switches in 755823488ns (377.9ns/ctxsw)
  8. -- With CPU affinity --
  9. 10000000 system calls in 14343751134ns (1434.4ns/syscall)
  10. 2000000 process context switches in 16353343542ns (8176.7ns/ctxsw)
  11. 2000000 thread context switches in 13617487377ns (6808.7ns/ctxsw)
  12. sched_setscheduler(): Operation not permitted
  13. 2000000 thread context switches in 2363107269ns (1181.6ns/ctxsw)
  14. -- With CPU affinity to CPU 0 --
  15. 10000000 system calls in 11929472188ns (1192.9ns/syscall)
  16. 2000000 process context switches in 6915983386ns (3458.0ns/ctxsw)
  17. 2000000 thread context switches in 6837489882ns (3418.7ns/ctxsw)
  18. sched_setscheduler(): Operation not permitted
  19. 2000000 thread context switches in 795652256ns (397.8ns/ctxsw)

Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz 云海OS虚拟机

  1. 1 physical CPUs, 8 cores/CPU, 1 hardware threads/core = 8 hw threads total
  2. -- No CPU affinity --
  3. 10000000 system calls in 2841917410ns (284.2ns/syscall)
  4. 2000000 process context switches in 7404178178ns (3702.1ns/ctxsw)
  5. 2000000 thread context switches in 7502081647ns (3751.0ns/ctxsw)
  6. sched_setscheduler(): Operation not permitted
  7. 2000000 thread context switches in 222130514ns (111.1ns/ctxsw)
  8. -- With CPU affinity --
  9. 10000000 system calls in 2835862084ns (283.6ns/syscall)
  10. 2000000 process context switches in 4990890087ns (2495.4ns/ctxsw)
  11. 2000000 thread context switches in 4311646652ns (2155.8ns/ctxsw)
  12. sched_setscheduler(): Operation not permitted
  13. 2000000 thread context switches in 870608240ns (435.3ns/ctxsw)
  14. -- With CPU affinity to CPU 0 --
  15. 10000000 system calls in 2844931708ns (284.5ns/syscall)
  16. 2000000 process context switches in 7601947691ns (3801.0ns/ctxsw)
  17. 2000000 thread context switches in 7914561498ns (3957.3ns/ctxsw)
  18. sched_setscheduler(): Operation not permitted
  19. 2000000 thread context switches in 247057805ns (123.5ns/ctxsw)

Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz 云海OS物理机

  1. 2 physical CPUs, 12 cores/CPU, 2 hardware threads/core = 48 hw threads total
  2. -- No CPU affinity --
  3. 10000000 system calls in 5769760409ns (577.0ns/syscall)
  4. 2000000 process context switches in 7245677219ns (3622.8ns/ctxsw)
  5. 2000000 thread context switches in 7069213271ns (3534.6ns/ctxsw)
  6. sched_setscheduler(): Operation not permitted
  7. 2000000 thread context switches in 475086926ns (237.5ns/ctxsw)
  8. -- With CPU affinity --
  9. 10000000 system calls in 5762431985ns (576.2ns/syscall)
  10. 2000000 process context switches in 8692364627ns (4346.2ns/ctxsw)
  11. 2000000 thread context switches in 6572286258ns (3286.1ns/ctxsw)
  12. sched_setscheduler(): Operation not permitted
  13. 2000000 thread context switches in 1304249661ns (652.1ns/ctxsw)
  14. -- With CPU affinity to CPU 0 --
  15. 10000000 system calls in 5774310295ns (577.4ns/syscall)
  16. 2000000 process context switches in 6869635514ns (3434.8ns/ctxsw)
  17. 2000000 thread context switches in 6927117249ns (3463.6ns/ctxsw)
  18. sched_setscheduler(): Operation not permitted
  19. 2000000 thread context switches in 473255745ns (236.6ns/ctxsw)

飞腾S2500-物理机器-NFSV3

  1. 2 physical CPUs, 128 cores/CPU, 1 hardware threads/core = 256 hw threads total
  2. -- No CPU affinity --
  3. 10000000 system calls in 3838470070ns (383.8ns/syscall)
  4. 2000000 process context switches in 10913991269ns (5457.0ns/ctxsw)
  5. 2000000 thread context switches in 10987973614ns (5494.0ns/ctxsw)
  6. sched_setscheduler(): Operation not permitted
  7. 2000000 thread context switches in 354962539ns (177.5ns/ctxsw)
  8. -- With CPU affinity --
  9. 10000000 system calls in 3851009222ns (385.1ns/syscall)
  10. 2000000 process context switches in 10500204985ns (5250.1ns/ctxsw)
  11. 2000000 thread context switches in 8605107251ns (4302.6ns/ctxsw)
  12. sched_setscheduler(): Operation not permitted
  13. 2000000 thread context switches in 1694906366ns (847.5ns/ctxsw)
  14. -- With CPU affinity to CPU 0 --
  15. 10000000 system calls in 3871134715ns (387.1ns/syscall)
  16. 2000000 process context switches in 8211223439ns (4105.6ns/ctxsw)
  17. 2000000 thread context switches in 8915611368ns (4457.8ns/ctxsw)
  18. sched_setscheduler(): Operation not permitted
  19. 2000000 thread context switches in 362941497ns (181.5ns/ctxsw)

飞腾S2500-物理机器-银河麒麟V10

  1. model name : HUAWEI,Kunpeng 920
  2. 2 physical CPUs, 128 cores/CPU, 1 hardware threads/core = 256 hw threads total
  3. -- No CPU affinity --
  4. 10000000 system calls in 1104251960ns (110.4ns/syscall)
  5. 2000000 process context switches in 5502095280ns (2751.0ns/ctxsw)
  6. 2000000 thread context switches in 5057680610ns (2528.8ns/ctxsw)
  7. 2000000 thread context switches in 159336010ns (79.7ns/ctxsw)
  8. -- With CPU affinity --
  9. 10000000 system calls in 1104213220ns (110.4ns/syscall)
  10. 2000000 process context switches in 3157105260ns (1578.6ns/ctxsw)
  11. 2000000 thread context switches in 2749304460ns (1374.7ns/ctxsw)
  12. 2000000 thread context switches in 520588690ns (260.3ns/ctxsw)
  13. -- With CPU affinity to CPU 0 --
  14. 10000000 system calls in 1104361790ns (110.4ns/syscall)
  15. 2000000 process context switches in 2554260900ns (1277.1ns/ctxsw)
  16. 2000000 thread context switches in 2501093900ns (1250.5ns/ctxsw)
  17. 2000000 thread context switches in 159835540ns (79.9ns/ctxsw)

飞腾S2500-KVM虚拟机

  1. 10000000 system calls in 2016128780ns (201.6ns/syscall)
  2. 2000000 process context switches in 20813179318ns (10406.6ns/ctxsw)
  3. 2000000 thread context switches in 21270077053ns (10635.0ns/ctxsw)
  4. 2000000 thread context switches in 283497350ns (141.7ns/ctxsw)
  5. -- With CPU affinity --
  6. 10000000 system calls in 2003773606ns (200.4ns/syscall)
  7. 2000000 process context switches in 7149973534ns (3575.0ns/ctxsw)
  8. 2000000 thread context switches in 6041671015ns (3020.8ns/ctxsw)
  9. 2000000 thread context switches in 1184706267ns (592.4ns/ctxsw)
  10. -- With CPU affinity to CPU 0 --
  11. 10000000 system calls in 1996452026ns (199.6ns/syscall)
  12. 2000000 process context switches in 20093433102ns (10046.7ns/ctxsw)
  13. 2000000 thread context switches in 20838253803ns (10419.1ns/ctxsw)
  14. 2000000 thread context switches in 284723964ns (142.4ns/ctxsw)

海光机器

  1. model name : Hygon C86 7285 32-core Processor
  2. pgrep: cannot allocate 4611686018427387903 bytes
  3. 2 physical CPUs, 32 cores/CPU, 2 hardware threads/core = 128 hw threads total
  4. -- No CPU affinity --
  5. 10000000 system calls in 1188373575ns (118.8ns/syscall)
  6. 2000000 process context switches in 7182741168ns (3591.4ns/ctxsw)
  7. 2000000 thread context switches in 5057264353ns (2528.6ns/ctxsw)
  8. 2000000 thread context switches in 218741918ns (109.4ns/ctxsw)
  9. -- With CPU affinity --
  10. 10000000 system calls in 1199538092ns (120.0ns/syscall)
  11. 2000000 process context switches in 4926579090ns (2463.3ns/ctxsw)
  12. 2000000 thread context switches in 4116607893ns (2058.3ns/ctxsw)
  13. 2000000 thread context switches in 877003690ns (438.5ns/ctxsw)
  14. -- With CPU affinity to CPU 0 --
  15. 10000000 system calls in 1207213049ns (120.7ns/syscall)
  16. 2000000 process context switches in 4803238321ns (2401.6ns/ctxsw)
  17. 2000000 thread context switches in 5033478360ns (2516.7ns/ctxsw)
  18. 2000000 thread context switches in 218102516ns (109.1ns/ctxsw)

鲲鹏机器

  1. 2 physical CPUs, 128 cores/CPU, 1 hardware threads/core = 256 hw threads total
  2. -- No CPU affinity --
  3. 10000000 system calls in 1628256836ns (162.8ns/syscall)
  4. 2000000 process context switches in 3567828849ns (1783.9ns/ctxsw)
  5. 2000000 thread context switches in 3366796751ns (1683.4ns/ctxsw)
  6. 2000000 thread context switches in 208056729ns (104.0ns/ctxsw)
  7. -- With CPU affinity --
  8. 10000000 system calls in 3957162873ns (395.7ns/syscall)
  9. 2000000 process context switches in 66176473553ns (33088.2ns/ctxsw)
  10. 2000000 thread context switches in 64858764678ns (32429.4ns/ctxsw)
  11. 2000000 thread context switches in 9224336984ns (4612.2ns/ctxsw)
  12. -- With CPU affinity to CPU 0 --
  13. 10000000 system calls in 1658580824ns (165.9ns/syscall)
  14. 2000000 process context switches in 4162672768ns (2081.3ns/ctxsw)
  15. 2000000 thread context switches in 3930988507ns (1965.5ns/ctxsw)
  16. 2000000 thread context switches in 206905930ns (103.5ns/ctxsw)

Intel 8369HB 3.3Ghz

  1. 10000000 system calls in 2039800553ns (204.0ns/syscall)
  2. 2000000 process context switches in 3484116193ns (1742.1ns/ctxsw)
  3. 2000000 thread context switches in 3504345370ns (1752.2ns/ctxsw)
  4. sched_setscheduler(): Operation not permitted
  5. 2000000 thread context switches in 163336302ns (81.7ns/ctxsw)
  6. -- With CPU affinity --
  7. 10000000 system calls in 2042749498ns (204.3ns/syscall)
  8. 2000000 process context switches in 3512477901ns (1756.2ns/ctxsw)
  9. 2000000 thread context switches in 3037479215ns (1518.7ns/ctxsw)
  10. sched_setscheduler(): Operation not permitted
  11. 2000000 thread context switches in 589604636ns (294.8ns/ctxsw)
  12. -- With CPU affinity to CPU 0 --
  13. 10000000 system calls in 2037861063ns (203.8ns/syscall)
  14. 2000000 process context switches in 3543912186ns (1772.0ns/ctxsw)
  15. 2000000 thread context switches in 3575216872ns (1787.6ns/ctxsw)
  16. sched_setscheduler(): Operation not permitted
  17. 2000000 thread context switches in 164079529ns (82.0ns/ctxsw)

阿里倚天710

  1. 1 physical CPUs, 8 cores/CPU, 1 hardware threads/core = 8 hw threads total
  2. -- No CPU affinity --
  3. 10000000 system calls in 672626352ns (67.3ns/syscall)
  4. 2000000 process context switches in 3586487130ns (1793.2ns/ctxsw)
  5. 2000000 thread context switches in 3228362627ns (1614.2ns/ctxsw)
  6. sched_setscheduler(): Operation not permitted
  7. 2000000 thread context switches in 102817391ns (51.4ns/ctxsw)
  8. -- With CPU affinity --
  9. 10000000 system calls in 672290182ns (67.2ns/syscall)
  10. 2000000 process context switches in 1990312435ns (995.2ns/ctxsw)
  11. 2000000 thread context switches in 1682598464ns (841.3ns/ctxsw)
  12. sched_setscheduler(): Operation not permitted
  13. 2000000 thread context switches in 328222163ns (164.1ns/ctxsw)
  14. -- With CPU affinity to CPU 0 --
  15. 10000000 system calls in 672409838ns (67.2ns/syscall)
  16. 2000000 process context switches in 3347526340ns (1673.8ns/ctxsw)
  17. 2000000 thread context switches in 3100110717ns (1550.1ns/ctxsw)
  18. sched_setscheduler(): Operation not permitted
  19. 2000000 thread context switches in 102631615ns (51.3ns/ctxsw)

AMD 9T34

  1. model name : AMD EPYC 9T34 64-Core Processor
  2. 1 physical CPUs, 8 cores/CPU, 2 hardware threads/core = 16 hw threads total
  3. -- No CPU affinity --
  4. 10000000 system calls in 553414290ns (55.3ns/syscall)
  5. 2000000 process context switches in 1963917388ns (982.0ns/ctxsw)
  6. 2000000 thread context switches in 2131473467ns (1065.7ns/ctxsw)
  7. 2000000 thread context switches in 115396178ns (57.7ns/ctxsw)
  8. -- With CPU affinity --
  9. 10000000 system calls in 554322086ns (55.4ns/syscall)
  10. 2000000 process context switches in 2730693871ns (1365.3ns/ctxsw)
  11. 2000000 thread context switches in 2559121196ns (1279.6ns/ctxsw)
  12. 2000000 thread context switches in 550724648ns (275.4ns/ctxsw)
  13. -- With CPU affinity to CPU 0 --
  14. 10000000 system calls in 553295602ns (55.3ns/syscall)
  15. 2000000 process context switches in 2011838005ns (1005.9ns/ctxsw)
  16. 2000000 thread context switches in 2027328701ns (1013.7ns/ctxsw)
  17. 2000000 thread context switches in 114914625ns (57.5ns/ctxsw)

ContextSwitch 学习与使用的更多相关文章

  1. 从直播编程到直播教育:LiveEdu.tv开启多元化的在线学习直播时代

    2015年9月,一个叫Livecoding.tv的网站在互联网上引起了编程界的注意.缘于Pingwest品玩的一位编辑在上网时无意中发现了这个网站,并写了一篇文章<一个比直播睡觉更奇怪的网站:直 ...

  2. Angular2学习笔记(1)

    Angular2学习笔记(1) 1. 写在前面 之前基于Electron写过一个Markdown编辑器.就其功能而言,主要功能已经实现,一些小的不影响使用的功能由于时间关系还没有完成:但就代码而言,之 ...

  3. ABP入门系列(1)——学习Abp框架之实操演练

    作为.Net工地搬砖长工一名,一直致力于挖坑(Bug)填坑(Debug),但技术却不见长进.也曾热情于新技术的学习,憧憬过成为技术大拿.从前端到后端,从bootstrap到javascript,从py ...

  4. 消息队列——RabbitMQ学习笔记

    消息队列--RabbitMQ学习笔记 1. 写在前面 昨天简单学习了一个消息队列项目--RabbitMQ,今天趁热打铁,将学到的东西记录下来. 学习的资料主要是官网给出的6个基本的消息发送/接收模型, ...

  5. js学习笔记:webpack基础入门(一)

    之前听说过webpack,今天想正式的接触一下,先跟着webpack的官方用户指南走: 在这里有: 如何安装webpack 如何使用webpack 如何使用loader 如何使用webpack的开发者 ...

  6. Unity3d学习 制作地形

    这周学习了如何在unity中制作地形,就是在一个Terrain的对象上盖几座小山,在山底种几棵树,那就讲一下如何完成上述内容. 1.在新键得项目的游戏的Hierarchy目录中新键一个Terrain对 ...

  7. 《Django By Example》第四章 中文 翻译 (个人学习,渣翻)

    书籍出处:https://www.packtpub.com/web-development/django-example 原作者:Antonio Melé (译者注:祝大家新年快乐,这次带来<D ...

  8. 菜鸟Python学习笔记第一天:关于一些函数库的使用

    2017年1月3日 星期二 大一学习一门新的计算机语言真的很难,有时候连函数拼写出错查错都能查半天,没办法,谁让我英语太渣. 关于计算机语言的学习我想还是从C语言学习开始为好,Python有很多语言的 ...

  9. 多线程爬坑之路-学习多线程需要来了解哪些东西?(concurrent并发包的数据结构和线程池,Locks锁,Atomic原子类)

    前言:刚学习了一段机器学习,最近需要重构一个java项目,又赶过来看java.大多是线程代码,没办法,那时候总觉得多线程是个很难的部分很少用到,所以一直没下决定去啃,那些年留下的坑,总是得自己跳进去填 ...

  10. node.js学习(三)简单的node程序&&模块简单使用&&commonJS规范&&深入理解模块原理

    一.一个简单的node程序 1.新建一个txt文件 2.修改后缀 修改之后会弹出这个,点击"是" 3.运行test.js 源文件 使用node.js运行之后的. 如果该路径下没有该 ...

随机推荐

  1. KubeCon China 2023 | 拥抱开源,华为云原生华彩绽放

    本文分享自华为云社区<KubeCon China 2023 | 拥抱开源,华为云原生华彩绽放>,作者: 云容器大未来 . 2023 年度云原生全球旗舰盛会 KubeCon + CloudN ...

  2. DTSE Tech Talk丨第2期:1小时深度解读SaaS应用系统设计

    摘要:介绍在SaaS场景下如何技术选型,SaaS架构设计中关键的技术点等内容. 本文分享自华为云社区<DTSE Tech Talk丨第2期:1小时深度解读SaaS应用系统设计>,作者: 华 ...

  3. JPEG/Exif/TIFF格式解读(1):JEPG图片压缩与存储原理分析

    JPEG文件简介 JPEG的全称是JointPhotographicExpertsGroup(联合图像专家小组),它是一种常用的图像存储格式, jpg/jpeg是24位的图像文件格式,也是一种高效率的 ...

  4. 火山引擎 DataLeap 助你拥有 Notebook 交互式的开发体验

    更多技术交流.求职机会,欢迎关注字节跳动数据平台微信公众号,回复[1]进入官方交流群   Notebook 是一种支持 REPL 模式的开发环境.所谓「REPL」,即「读取-求值-输出」循环:输入一段 ...

  5. 火山引擎 DataTester 3 大功能升级:聚焦敏捷、智能与易用,帮助企业降本增效

    更多技术交流.求职机会,欢迎关注字节跳动数据平台微信公众号,回复[1]进入官方交流群 近日,火山引擎数智平台(VeDI)全面升级旗下 A/B 测试产品 DataTester,发布全新功能"M ...

  6. PPT 商务报告:如何建立专属PPT素材库

    PPT 商务报告:如何建立专属PPT素材库 为什么建立素材库? 省事:直接套用 应对紧急环境:无网络情况下,无法搜索 提升设计思路:帮助提升思路 通用型素材库 企业型素材库 模板 灵感网站 使用场景 ...

  7. django DRF

    博客目录 web应用模式 api接口 接口测试工具postman restful规范 drf安装 序列化和反序列化 CBV源码分析 drf之APIView分析 drf之Request对象分析 drf- ...

  8. 【Vue day01】前端发展介绍 Vue的快速使用 插值语法 指令系统之文本指令、事件指令、属性指令

    目录 前端发展介绍 Vue框架入门 M-V-VM思想(重要) 组件化开发 单页面应用 Vue的快速使用 前期准备 快速使用 对象与标签绑定 查看Vue对象 插值语法 三目运算符 指令系统之文本指令 v ...

  9. display:none和overflow:hidden的区别

    1.display:none 当将一个元素的display属性设置为none时,该元素将不会显示在网页中,并且不会占据任何空间.也就是说,该元素会完全隐藏,其他的元素会立即占据它原来的位置.该属性适用 ...

  10. 用户 IP,里面藏了多少秘密?

    大家都知道,要邮寄一封信给正确的收件人,需要提供准确而精细的地址,这个地址需要从国家和城市精确到邮政编码,街道和门牌号码.只有这样,邮局的工作人员才能知道将信送到那里. Internet 上也是如此, ...