[转] KVM Internals, code and more
KVM
Kernel-based Virtual Machine
Internals, code and more
http://slides.com/braoru/kvm#/
KVM in 5 secondes
- Introduced to make VT-x/AMD-V available to user space
- Exposes virtualization features securely through a single interface
- /dev/kvm
- vailable since 2.6.20 (2006)
- Clean and efficient dev From first LKML posting to merge: 3 months
- 100% orthogonal to core kernel
KVM is not KVM
First of all there is QEMU then KVM then Libvirt then the whole ecosystems..
At the begining, Qemu
- Running a guest involves executing guest code
- Handling timers
- Processing I/O
- Responding to monitor commands.
- Doing all these things at once without pausing guest execution
Deal with events
Deal with events
There are two popular architectures for programs that need to respond to events from multiple sources
DEAL WITH EVENTS
Parallel architecture
Splits work into processes or threads that can execute simultaneously.
DEAL WITH EVENTS
Event-driven architecture
Event-driven architecture reacts to events by running a main loop that dispatches to event handlers.
This is commonly implemented using the select(2) or poll(2) family of system calls to wait on multiple file descriptors.
Threading and event driven model of qemu
Qemu uses an hybrid architecture
Qemu the event_loop
- Event-driven architecture is centered around the event loop which dispatches events to handler functions. QEMU's main event loop is main_loop_wait()
- Waits for file descriptors to become readable or writable. File descriptors are a critical because files, sockets, pipes, and various other resources are all file descriptors.
- Runs expired timers.
- Runs bottom-halves (BHs), which used to avoid reentrancy and overflowing the call stack.
QEMU THE EVENT_LOOP
A file descriptor becomes ready, a timer expires, or a BH is scheduled, the event loop invokes a callback
- No other core code is executing at the same time so synchronization is not necessary
- Execute sequentially and atomically
- Only 1 thread of control needed at any given time
- No blocking system calls or long-running computations should be performed.
- Avoid spending an unbounded amount of time in a callback
- If you not follow those advices this will force the guest to pause and the monitor to become unresponsive.
QEMU threads
To help the event_loop
Offload what need to be offloaded
QEMU THREADS
TO HELP THE EVENT_LOOP
- There are system calls which have no non-blocking equivalent.
- Sometimes long-running computations flood the CPU and can't be easily break up into callbacks.
- In these cases dedicated worker threads can be used to carefully move these tasks out of core QEMU.
- One example of worker threads is vnc-jobs.c
- When a worker thread needs to notify core QEMU, a pipe or a qemu_eventfd() file descriptor is added to the event loop.
executing guest code
Here are two mechanism for executing guest code: Tiny Code Generator (TCG) and KVM
Executing guest code in qemu is very simple, it use thread.
Exactly 1 thread by vcpu.
summary about qemu processing
- 1 process per guest
- 1 thread for the main event_loop()
- 1 thread by vcpu
- As many (reasonably) threads as needed for offloaded tasks
All the existing Linux strengths at our disposal
Memory as Huge page, KSM, IO, Scheduler, Energy, Device hotplug, networking, Security, All the Linux software world, ...
qemu guest memory
Guest ram is allocated at qemu start up
This mapped memory is "really" allocated by the process (with malloc())
Tips : use -mem-path to give qemu a memory image to load (can be very very good in #infosec)
reminder
why x86 virt is a pain...
- No hardware provisions
- Instruction behave differently depending on privilege context
- Architecture not built for trap and emulate
- CISC is ... CISC
A complete theorical virtualisation courses : CS 686: Special Topic: Intel EM64T and VT Extensions (Spring 2007)
reminder
how intel vt-x help
Guest SW <-> VMM Transitions
Virtual-machine control structure
KVM virtualisation
KVM is a virtualization feature in the Linux kernel that lets you safely execute guest code directly on the host CPU
- open /dev/kvm
- use iocrl KVM_RUN (KVM IOCTL doc)
As simple as :
open("/dev/kvm")
ioctl(KVM_CREATE_VM)
ioctl(KVM_CREATE_VCPU)
for (;;) {
ioctl(KVM_RUN)
switch (exit_reason) {
case KVM_EXIT_IO: /* ... */
case KVM_EXIT_HLT: /* ... */
}
}
kvm virtualisation
It's DEMO time
What do you need :
- A bit of C ...
- A touch of ASM
- Makefile
- gcc
EMU / KVM / CPU / TIME interactions
Light vs Heavy exit
QEMU / KVM / CPU / TIME interactions
causes of VM Exits
VM Entry :
- Transition from VMM to Guest
- Enters VMX non-root operation
- Loads Guest state and Exit criteria from VMCS
- VMLAUNCH instruction used on initial entry
- VMRESUME instruction used on subsequent entries
VM Exit :
- VMEXIT instruction used on transition from Guest to VMM
- Enters VMX root operation
- Saves Guest state in VMCS
- Loads VMM state from VMCS
start a kvm vm in reality
A bit more complicated than before :
- KVM CREATE VM : The new VM has no virtual cpus and no memory
- KVM SET USER MEMORY REGION : MAP userspace memory for the VM
- KVM CREATE IRQCHIP / ...PIT KVM CREATE VCPU : Create hardware component and map them with VT-X functionnalities
- KVM SET REGS / ...SREGS / ...FPU / ... KVM SET CPUID / ...MSRS / ...VCPU EVENTS / ... KVM SET LAPIC : hardware configurations
- KVM RUN : Start the VM
start a vm in qemu-kvm
/usr/bin/qemu-kvm -S -M pc-0.13 -enable-kvm -m 512 -smp 2,sockets=2,cores=1,threads=1
-name test -uuid e9b4c7be-d60a-c16e-92c3-166421b4daca -nodefconfig -nodefaults
-chardev socket,id=monitor,path=/var/lib/libvirt/qemu/test.monitor,server,nowait
-mon chardev=monitor,mode=readline -rtc base=utc -boot c
-drive file=/var/lib/libvirt/images/test.img,if=none,id=drive-virtio-disk0,boot=on,format=raw
-device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0
-drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
-device virtio-net-pci,vlan=0,id=net0,mac=52:54:00:cc:1c:10,bus=pci.0,addr=0x3
-net tap,fd=59,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0
-usb -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
Now you really know why tools are great
KVM processing
What about passthroug, paravirt and virtio
Reduce VM exits or make them lightweight
Improve I/O throughput & latency (less emulation)
Compensates virtualization effects
Enable direct host-guest interaction
VIRTIO device
- Network
- Block
- Serial I/O (console, host-guest channel, ...)
- Memory balloon
- File system (9P)
- SCSI
Based on generic RX/TX buffer
Logic distributed in the guest driver (aka virtual device) and qemu backend (and kernel backend in some cases)
Virtio device
vhost example
- High throughput
- Low latency guest networking
Normally the QEMU userspace process emulates I/O accesses from the guest.
Vhost puts virtio emulation code into the kernel
- Dont forget vhost-blk and vhost-scsi
Virtio
vhost example
- vhost-net driver creates a /dev/vhost-net character device on the host
- QEMU is launched with -netdev tap,vhost=on and open /dev/vhost-net
- vhost driver creates a kernel thread called vhost-$pid
- $pid = pidof(QEMU)
- Job of the worker thread is to handle I/O events and perform the device emulation.
- vhost architecture is not directly linked to KVM
- Use ioeventfd and irqfd
Virtio device
Virtio device
vhost
Kernel code :
- drivers/vhost/vhost.c - common vhost driver code
- drivers/vhost/net.c - vhost-net driver
- virt/kvm/eventfd.c - ioeventfd and irqfd
The QEMU userspace code shows how to initialize the vhost instance :
- hw/vhost.c - common vhost initialization code
- hw/vhost_net.c - vhost-net initialization
LIBVIRT
very small introduction
- Virtualization library: manage guest on one or many nodes
- Share the application stack between hypervisors
- Long term stability and compatibility of API and ABI
- Provide security and remote access “out of the box”
- Expand to management APIs (Node, Storage, Network)
livirt
Very small introduction
[转] KVM Internals, code and more的更多相关文章
- [转] KVM虚拟化技术生态环境介绍
KVM虚拟化技术生态环境介绍 http://xanpeng.github.io/wiki/virt/kvm-virtulization-echosystem-intro.html kvm和qemu/q ...
- qemu kvm 虚拟化
虚拟化: KVM是一个基于Linux内核的虚拟机,属于完全虚拟化.虚拟机监控的实现模型有两类:监控模型(Hypervisor)和宿主机模型(Host-based).由于监控模型需要进行处理器调度,还需 ...
- <Mastering KVM Virtualization>:第二章 KVM内部原理
在本章中,我们将讨论libvirt.QEMU和KVM的重要数据结构和内部实现.然后,我们将深入了解KVM下vCPU的执行流程. 在这一章,我们将讨论: libvirt.QEMU和KVM的内部运作方式. ...
- [原] KVM 虚拟化原理探究(2)— QEMU启动过程
KVM 虚拟化原理探究- QEMU启动过程 标签(空格分隔): KVM [TOC] 虚拟机启动过程 第一步,获取到kvm句柄 kvmfd = open("/dev/kvm", O_ ...
- 重读 code complete 说说代码质量
重读code complete 说说代码质量 2014年的第一篇文章本来计划写些过去一年的总结和新年展望,但是因为还有一些事情要过一阵才能完成,所以姑且不谈这个,说说最近重读code complete ...
- KVM 介绍(8):使用 libvirt 迁移 QEMU/KVM 虚机和 Nova 虚机 [Nova Libvirt QEMU/KVM Live Migration]
学习 KVM 的系列文章: (1)介绍和安装 (2)CPU 和 内存虚拟化 (3)I/O QEMU 全虚拟化和准虚拟化(Para-virtulizaiton) (4)I/O PCI/PCIe设备直接分 ...
- Following a Select Statement Through Postgres Internals
This is the third of a series of posts based on a presentation I did at the Barcelona Ruby Conferenc ...
- Flink Internals
https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals Memory Management (Batch API) In ...
- Windows Internals学习笔记(五)Synchronization
参考资料: 1. <Windows Internals> 2. 自旋锁spinlock剖析与改进 3. Lock指令前缀 4. Lock指令前缀(二) 5. Kernel Dispatch ...
随机推荐
- 【C++】第1章 在VS2015中用C++编写控制台应用程序
分类:C++.VS2015 创建日期:2016-06-12 一.简介 看到不少人至今还在用VC 6.0开发工具学习C++,其实VC 6.0开发工具早就被淘汰了.这里仅介绍学习C++时推荐使用的两种开发 ...
- C#中的lock关键字有何作用
作为C#的程序员来说,在遇到线程同步的需求时最常用的就是lock关键字.但如何正确并有效地使用lock,却是能否高效地达到同步要求的关键.正因为如此,程序员需要完全理解lock究竟为程序做了什么. 所 ...
- 重新想象 Windows 8 Store Apps (44) - 多线程之异步编程: 经典和最新的异步编程模型, IAsyncInfo 与 Task 相互转换
[源码下载] 重新想象 Windows 8 Store Apps (44) - 多线程之异步编程: 经典和最新的异步编程模型, IAsyncInfo 与 Task 相互转换 作者:webabcd 介绍 ...
- mvc设计模式和mvc框架的区别
Spring中的新名称也太多了吧!IOC/DI/MVC/AOP/DAO/ORM... 对于刚刚接触spring的我来说确实晕了头!可是一但你完全掌握了一个概念,那么它就会死心塌地的为你服务了.这可比女 ...
- Genymotion Shell命令
命令行选项 •-h 输出帮助信息 •-r ip_address 连接到特定Genymotion虚拟设备 •-c "command" 在原生shell环境中执行给定命令而后返回 •- ...
- 设计模式之Builder (创建者模式)的一些个人理解(转)
对于Builder模式很简单,但是一直想不明白为什么要这么设计,为什么要向builder要Product而不是向知道建造过程的Director要.刚才google到一篇文章,总算清楚了.在这里转贴一下 ...
- NullPointerException at android.widget.AbsListView.obtainView at android.widget.ListView.makeAndAddView
使用ExpandableListView的时候,报如下错.网上搜索发现原来是在CommonNumberQueryAdapter的getGroupView()方法里返回的是null,注意细节哦!!! 1 ...
- HTML Jquery
在<网页制作Dreamweaver(悬浮动态分层导航)>中,运用到了jQuery的技术,轻松实现了菜单的下拉.显示.隐藏的效果,不必再用样式表一点点地修改,省去了很多麻烦,那么jQuery ...
- 使用div创建选取框
使用div实现了选取框效果. 代码如下 <!DOCTYPE html> <html> <head> <title>myCanvasTest</ti ...
- Android项目实战(十一):moveTaskToBack(boolean ) 方法的使用
当你开发的程序被按后退键退出的时候, 你肯定不想让他就这么被finish()吧,那么就想把程序退置到后台就可. (类似于PC端,你关闭一个浏览器和你最小化一个浏览器的区别) 参看方法:public b ...