Virtio: An I/O virtualization framework for Linux
The Linux kernel supports a variety of virtualization schemes, and that's likely to grow as virtualization advances and new schemes are discovered (for example, lguest
). But with all these virtualization schemes running on top of Linux, how do they exploit the underlying kernel for I/O virtualization? The answer is virtio
, which provides an efficient abstraction for hypervisors and a common set of I/O virtualization drivers. Discover virtio
, and learn why Linux will soon be the hypervisor of choice.
In a nutshell, virtio
is an abstraction layer over devices in a paravirtualized hypervisor. virtio
was developed by Rusty Russell in support of his own virtualization solution called lguest
. This article begins with an introduction to paravirtualization and emulated devices, and then explores the details of virtio
. The focus is on the virtio
framework from the 2.6.30 kernel release.
Linux is the hypervisor playground. As my article on Linux as a hypervisor showed, Linux offers a variety of hypervisor solutions with different attributes and advantages. Examples include the Kernel-based Virtual Machine (KVM), lguest
, and User-mode Linux. Having these different hypervisor solutions on Linux can tax the operating system based on their independent needs. One of the taxes is virtualization of devices. Rather than have a variety of device emulation mechanisms (for network, block, and other drivers), virtio
provides a common front end for these device emulations to standardize the interface and increase the reuse of code across the platforms.
Full virtualization vs. paravirtualization
Join the green groups on My developerWorks
Discuss topics and share resources about energy, efficiency, and the environment on the GReen IT Report space and the Green computing group on My developerWorks.
Let's start with a quick discussion of two distinct types of virtualization schemes: full virtualization and paravirtualization. In full virtualization, the guest operating system runs on top of a hypervisor that sits on the bare metal. The guest is unaware that it is being virtualized and requires no changes to work in this configuration. Conversely, in paravirtualization, the guest operating system is not only aware that it is running on a hypervisor but includes code to make guest-to-hypervisor transitions more efficient (see Figure 1).
In the full virtualization scheme, the hypervisor must emulate device hardware, which is emulating at the lowest level of the conversation (for example, to a network driver). Although the emulation is clean at this abstraction, it's also the most inefficient and highly complicated. In the paravirtualization scheme, the guest and the hypervisor can work cooperatively to make this emulation efficient. The downside to the paravirtualization approach is that the operating system is aware that it's being virtualized and requires modifications to work.
Figure 1. Device emulation in full virtualization and paravirtualization environments
Hardware continues to change with virtualization. New processors incorporate advanced instructions to make guest operating systems and hypervisor transitions more efficient. And hardware continues to change for input/output (I/O) virtualization, as well (see Resources to learn about Peripheral Controller Interconnect [PCI] passthrough and single- and multi-root I/O virtualization).
Virtio alternatives
virtio
is not entirely alone in this space. Xen provides paravirtualized device drivers, and VMware provides what are called Guest Tools.
But in traditional full virtualization environments, the hypervisor must trap these requests, and then emulate the behaviors of real hardware. Although doing so provides the greatest flexibility (namely, running an unmodified operating system), it does introduce inefficiency (see the left side of Figure 1). The right side of Figure 1 shows the paravirtualization case. Here, the guest operating system is aware that it's running on a hypervisor and includes drivers that act as the front end. The hypervisor implements the back-end drivers for the particular device emulation. These front-end and back-end drivers are where virtio
comes in, providing a standardized interface for the development of emulated device access to propagate code reuse and increase efficiency.
An abstraction for Linux guests
From the previous section, you can see that virtio
is an abstraction for a set of common emulated devices in a paravirtualized hypervisor. This design allows the hypervisor to export a common set of emulated devices and make them available through a common application programming interface (API). Figure 2 illustrates why this is important. With paravirtualized hypervisors, the guests implement a common set of interfaces, with the particular device emulation behind a set of back-end drivers. The back-end drivers need not be common as long as they implement the required behaviors of the front end.
Figure 2. Driver abstractions with virtio
Note that in reality (though not required), the device emulation occurs in user space using QEMU, so the back-end drivers communicate into the user space of the hypervisor to facilitate I/O through QEMU. QEMU is a system emulator that, in addition to providing a guest operating system virtualization platform, provides emulation of an entire system (PCI host controller, disk, network, video hardware, USB controller, and other hardware elements).
The virtio
API relies on a simple buffer abstraction to encapsulate the command and data needs of the guest. Let's look at the internals of the virtio
API and its components.
Virtio architecture
In addition to the front-end drivers (implemented in the guest operating system) and the back-end drivers (implemented in the hypervisor), virtio
defines two layers to support guest-to-hypervisor communication. At the top level (called virtio) is the virtual queue interface that conceptually attaches front-end drivers to back-end drivers. Drivers can use zero or more queues, depending on their need. For example, the virtio
network driver uses two virtual queues (one for receive and one for transmit), where the virtio
block driver uses only one. Virtual queues, being virtual, are actually implemented as rings to traverse the guest-to-hypervisor transition. But this could be implemented any way, as long as both the guest and hypervisor implement it in the same way.
Figure 3. High-level architecture of the virtio framework
As shown in Figure 3, five front-end drivers are listed for block devices (such as disks), network devices, PCI emulation, a balloon driver (for dynamically managing guest memory usage), and a console driver. Each front-end driver has a corresponding back-end driver in the hypervisor.
Concept hierarchy
From the perspective of the guest, an object hierarchy is defined as shown in Figure 4. At the top is the virtio_driver
, which represents the front-end driver in the guest. Devices that match this driver are encapsulated by the virtio_device
(a representation of the device in the guest). This refers to the virtio_config_ops
structure (which defines the operations for configuring the virtio
device). The virtio_device
is referred to by the virtqueue
(which includes a reference to the virtio_device
to which it serves). Finally, each virtqueue
object references the virtqueue_ops
object, which defines the underlying queue operations for dealing with the hypervisor driver. Although the queue operations are the core of the virtio
API, I provide a brief discussion of discovery, and then explore the virtqueue_ops
operations in more detail.
Figure 4. Object hierarchy of the virtio front end
The process begins with the creation of a virtio_driver
and subsequent registration via register_virtio_driver
. The virtio_driver
structure defines the upper-level device driver, list of device IDs that the driver supports, a features table (dependent upon the device type), and a list of callback functions. When the hypervisor identifies the presence of a new device that matches a device ID in the device list, the probe
function is called (provided in the virtio_driver
object) to pass up the virtio_device
object. This object is cached with the management data for the device (in a driver-dependent way). Depending on the driver type, the virtio_config_ops
functions may be invoked to get or set options specific to the device (for example, getting the Read/Write status of the disk for a virtio_blk
device or setting the block size of the block device).
Note that the virtio_device
includes no reference to the virtqueue
(but the virtqueue
does reference the virtio_device
). To identify the virtqueue
s that associate with this virtio_device
, you use the virtio_config_ops
object with the find_vq
function. This object returns the virtual queues associated with this virtio_device
instance. The find_vq
function also permits the specification of a callback function for the virtqueue
(see the virtqueue
structure in Figure 4), which is used to notify the guest of response buffers from the hypervisor.
The virtqueue
is a simple structure that identifies an optional callback function (which is called when the hypervisor consumes the buffers), a reference to the virtio_device
, a reference to the virtqueue
operations, and a special priv
reference that refers to the underlying implementation to use. Although the callback
is optional, it's possible to enable or disable callbacks dynamically.
But the core of this hierarchy is the virtqueue_ops
, which defines how commands and data are moved between the guest and the hypervisor. Let's first explore the object that is added or removed from the virtqueue
.
Virtio buffers
Guest (front-end) drivers communicate with hypervisor (back-end) drivers through buffers. For an I/O, the guest provides one or more buffers representing the request. For example, you could provide three buffers, with the first representing a Read request and the subsequent two buffers representing the response data. Internally, this configuration is represented as a scatter-gather list (with each entry in the list representing an address and a length).
Core API
Linking the guest driver and hypervisor driver occurs through the virtio_device
and most commonly through virtqueue
s. The virtqueue
supports its own API consisting of five functions. You use the first function, add_buf
, to provide a request to the hypervisor. This request is in the form of the scatter-gather list discussed previously. To add_buf
, the guest provides the virtqueue
to which the request is to be enqueued, the scatter-gather list (an array of addresses and lengths), the number of buffers that serve as out entries (destined for the underlying hypervisor), and the number of in entries (for which the hypervisor will store data and return to the guest). When a request has been made to the hypervisor through add_buf
, the guest can notify the hypervisor of the new request using the kick
function. For best performance, the guest should load as many buffers as possible onto the virtqueue
before notifying through kick
.
Responses from the hypervisor occur through the get_buf
function. The guest can poll simply by calling this function or wait for notification through the provided virtqueue callback
function. When the guest learns that buffers are available, the call to get_buf
returns the completed buffers.
The final two functions in the virtqueue
API are enable_cb
and disable_cb
. You can use these functions to enable and disable the callback process (via the callback
function initialized in the virtqueue
through the find_vq
function). Note that the callback function and the hypervisor are in separate address spaces, so the call occurs through an indirect hypervisor call (such as kvm_hypercall
).
The format, order, and contents of the buffers are meaningful only to the front-end and back-end drivers. The internal transport (rings in the current implementation) move only buffers and have no knowledge of their internal representation.
Example virtio drivers
You can find the source to the various front-end drivers within the ./drivers subdirectory of the Linux kernel. The virtio
network driver can be found in ./drivers/net/virtio_net.c, and the virtio
block driver can be found in ./drivers/block/virtio_blk.c. The subdirectory ./drivers/virtio provides the implementation of the virtio
interfaces (virtio
device, driver, virtqueue
, and ring). virtio
has also been used in High-Performance Computing (HPC) research to develop inter-virtual machine (VM) communications through shared memory passing. Specifically, this was implemented through a virtualized PCI interface using the virtio
PCI driver. You can learn more about this work in the Resources section.
You can exercise this paravirtualization infrastructure today in the Linux kernel. All you need is a kernel to act as the hypervisor, a guest kernel, and QEMU for device emulation. You can use either KVM (a module that exists in the host kernel) or with Rusty Russell's lguest
(a modified Linux guest kernel). Both of these virtualization solutions support virtio
(along with QEMU for system emulation and libvirt
for virtualization management).
The result of Rusty's work is a simpler code base for paravirtualized drivers and faster emulation of virtual devices. But even more important, virtio
has been found to provide better performance (2-3 times for network I/O) than current commercial solutions. This performance boost comes at a cost, but it's well worth it if Linux is your hypervisor and guest.
Going further
Although you may never develop front-end or back-end drivers for virtio
, it implements an interesting architecture and is worth understanding in more detail. virtio
opens up new opportunities for efficiency in paravirtualized I/O environments while building from previous work in Xen. Linux continues to prove itself as a production hypervisor and a research platform for new virtualization technologies. virtio
is yet another example of the strengths and openness of Linux as a hypervisor.
Resources
Learn
- One of the best resources for deep technical details of
virtio
is Rusty Russell's "Virtio: towards a de factor standard for virtual I/O devices." This paper provides a very thorough treatment ofvirtio
and its internals. - This article touched on a two virtualization mechanisms: full virtualization and paravirtualization. To learn more about the variety of virtualization mechanisms in Linux, check out Tim's article "Virtual Linux" (developerworks, December 2006).
- The key behind
virtio
is exploiting paravirtualization to improve overall I/O performance. To learn more about the role of Linux as a hypervisor and for device emulation, check out Tim's articles "Anatomy of a Linux hypervisor" (developerWorks, May 2009) and "Linux virtualization and PCI passthrough" (developerworks, October 2009). - This article touched on device emulation, and one of the most important applications that provides this functionality is QEMU (a system emulator). You can read more about QEMU in Tim's article "System emulation with QEMU" (developerWorks, September 2007).
- Xen also includes the concept of paravirtualized drivers. Paravirtual Windows Drivers discusses both paravirtualization and also hardware-assisted virtualization (HVM) in particular.
- One of the most important benefits of
virtio
is performance in paravirtualized environments. This blog post from btm.geek shows the performance advantage ofvirtio
using KVM. - This article touched on the intersection of
libvirt
(an open virtualization API) and thevirtio
framework. The libvirt wiki shows how to specifyvirtio
devices inlibvirt
. - This article discussed two hypervisor solutions that take advantage of the
virtio
framework: lguest is an x86 hypervisor, also developed by Rusty Russell, and KVM is another Linux-based hypervisor that was first built into the Linux kernel. - One interesting use of
virtio
was the development of shared-memory message passing to allow VMs to communicate with one another through the hypervisor, as described in this paper from SpringerLink. - In the developerWorks Linux zone, find more resources for Linux developers, and scan our most popular articles and tutorials.
- See all Linux tutorials and Linux tips on developerWorks.
- Stay current with developerWorks technical events and Webcasts.
- Follow developerWorks on Twitter.
Virtio: An I/O virtualization framework for Linux的更多相关文章
- 【转】How to Start Intel Hardware-assisted Virtualization (hypervisor) on Linux to Speed-up Intel Android x86 Emulator
[转]How to Start Intel Hardware-assisted Virtualization (hypervisor) on Linux to Speed-up Intel Andro ...
- Virtualization solutions on Linux systems - KVM and VirtualBox
Introduction Virtualization packages are means for users to run various operating systems without &q ...
- 别以为真懂Openstack: 虚拟机创建的50个步骤和100个知识点(4)
六.Libvirt 对于Libvirt,在启动虚拟机之前,首先需要define虚拟机,是一个XML格式的文件 列出所有的Instance # virsh list Id Name ...
- Linux电源管理-Linux regulator framework概述
前言 1. 什么是regulator? regulator翻译为"调节器",分为voltage regulator(电压调节器)和current(电流调节器).一般电源 ...
- linux下的 c 和 c++ 开发工具及linux内核开发工具
https://opensource.com/article/18/6/embedded-linux-build-tools https://github.com/luong-komorebi/Awe ...
- Linux下开启关闭SeLinux
SELinux (Security-Enhanced Linux) in Fedora is an implementation of mandatory access control in the ...
- linux内核调试指南
linux内核调试指南 一些前言 作者前言 知识从哪里来 为什么撰写本文档 为什么需要汇编级调试 ***第一部分:基础知识*** 总纲:内核世界的陷阱 源码阅读的陷阱 代码调试的陷阱 原理理解的陷阱 ...
- Professional C# 6 and .NET Core 1.0 - 38 Entity Framework Core
本文内容为转载,重新排版以供学习研究.如有侵权,请联系作者删除. 转载请注明本文出处:Professional C# 6 and .NET Core 1.0 - 38 Entity Framework ...
- .NET作品集:linux下的博客程序
博客程序架构 本博客程序是博主11年的时候参考loachs小泥鳅博客内核开发的.net跨平台博客cms,距今已有6年多了,个人博客网站一直在用,虽然没有wordpress那么强大,但是当时在深究.ne ...
随机推荐
- QT Designer基础——登录界面设计基础版2
认识QT Designer提供的可选控件:以下八个大类 Layouts:布局相关 Spacers:留空 Buttons:可点击的按钮类 Item Views和 Item Widgets:高级控件,例如 ...
- scrapy meta信息丢失
在做58同城爬二手房时,由于房产详情页内对价格进行了转码处理,所以只能从获取详情页url时同时获取该url对应房产的价格,并通过meta传递给下回调函数 现在问题是,在回调函数中找不到原函数meta信 ...
- Struct2 基础介绍
前面花一周时间学习了servlet+jsp+mysql, 并且简单实现了登录注册等操作.对Servlet应用有了基础了解! 关于Struct2这个经常听说,但是自己没有用过.今天在这学习总结下,目的是 ...
- webpack4.0
1. webpack 刚开始是js的模块打包,现在是一个任何模块打包工具 可以识别 CommonJS引入规范 CMD AMD 2. commonJS: module.exports r ...
- Python处理word文件
python对word文件进行读写和复制 import win32conimport win32com.clientimport os #读取word文件def readWoldFile(path): ...
- clean-room 洁净室软件工程
众所周知,软件工程的主要目的是提高软件的开发效率和软件质量.近年来发展起来的洁净室软件工程(cleanroom software engineering)提出了用统计的质量控制方法管理软件 ...
- 一、selenium 环境搭建
本教程演示是在window系统上演示,linux.mac 系统以后会更新. 1.准备工作 1.python2或者python3安装包,官网:https://www.python.org/downloa ...
- 安卓学习 intent
其实学习了好几个星期了,是看老罗的视频,但进度太慢 今天 换了一本书 Intent 切换页面 啊啊啊啊 CompentName comp=new CompentName(MainActivity.th ...
- 探索未知种族之osg类生物---渲染遍历之裁剪三
前言 在osgUtil::CullVisitor,我们发现apply函数的重载中,有CullVisitor::apply(Group& node),CullVisitor::apply(Swi ...
- java socket编程(一)简介
#Java TCP Ip编程 其实下面几张图片就可以解释简单解释tcp-ip协议的大体流程了. ###计算机网络,分组报文和协议 网络是一组通过通信信道相互连接的机器组成. 组与组之间通过路由器连接 ...