http://superuser.openstack.org/articles/openstack-for-nfv-applications-enabling-single-root-i-o-virtualization-and-pci-passthrough

NFV

Without entering into the details of the NFV specifications, the goal in OpenStack is to optimize network, memory and CPU performance on the running instances.

Network Function Virtualization (NFV) initiatives in the telecommunication industry require specific OpenStack functionalities enabled.

In this article we’ll see Single Root I/O virtualization (SR-IOV) and PCI-Passthrough, which are commonly required by some Virtual Network Functions (VNF) running as instances on top of OpenStack.

In addition to SR-IOV and PCI-Passthrough there are other techniques such as DPDK, CPU pinning and the use of NUMA nodes which also are usually required by VNFs. A future post will cover some of them.

SR-IOV

SR-IOV allows a PCIe network interface, offering Physical Functions (PF) to expose multiple network interfaces, appearing as Virtual Functions (VF). For example, the network interfacep5p1 configured with 5 VFs looks like this from the operating system:

# ip link show p5p1
8: p5p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
link/ether a0:36:9f:8f:3f:b8 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto

The VFs can be used by the OS or exposed to VMs. They look exactly as regular NIC:

# ip link show p5p1_1
18: p5p1_1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
link/ether 72:1c:ef:b0:a8:d0 brd ff:ff:ff:ff:ff:ff

Only certain NICs support SR-IOV. In this example I’m using Intel’s X540-AT2 NICs which uses the driver ixgbe.

Linux configuration for SR-IOV

To use SR-IOV in OpenStack, firstly we need to make sure the operating system is configured to support it. There are 2 kernel parameters to be set:

intel_iommu=on
ixgbe.max_vfs=5

Note that ixgbe is specific for the Intel X540-AT2 NIC and you might be using another one. You can also use a different number of VFs.

To enable the parameters in RHEL based systems it works as follows:

  1. Add the parameters to /etc/default/grub in GRUB_CMDLINE_LINUX
  2. Regenerate the config file with: grub2-mkconfig -o /boot/grub2/grub.cfg
  3. Rebuild the initramfs file with: dracut -f -v

We also need to make sure that the admin state of the interface is UP:

# ip link show p5p1
# ip link set p5p1 up

And by setting the appropriate network interface configuration file in /etc/sysconfig/network-scripts/ifcfg-p5p1 as:

BOOTPROTO=none
DEVICE=p5p1
ONBOOT=yes

OpenStack configuration for SR-IOV

1. Neutron

SR-IOV works with the VLAN type driver in Neutron. We enable it in /etc/neutron/plugin.ini:

[ml2]
type_drivers=vxlan,vlan
tenant_network_types=vxlan,vlan

The mechanism driver is sriovnicswitch, which is configured in the same [ml2] section as follows:

mechanism_drivers=openvswitch,sriovnicswitch

Every time we create a new SR-IOV network in Neutron, it will configure it on a VLAN from a range that we need specify. It needs a name too. In this example the range is 1010 to 1020 and the physical network for Neutron will be called physnet_sriov :

[ml2_type_vlan]
network_vlan_ranges=physnet_sriov:1010:1020

Now, we configure SR-IOV settings in /etc/neutron/plugins/ml2/ml2_conf_sriov.ini. In the section[ml2_sriov] we need to tell the driver which NIC we will use:

[ml2_sriov]
supported_pci_vendor_devs=8086:1515

The numbers represent the vendor ID (8086) and the product ID (1515).  To get them we can use lspci -nn:

# lspci -nn|grep X540-AT2
06:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 [8086:1528] (rev 01)
06:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 [8086:1528] (rev 01)

By default the neutron-server service is not loading the configuration in the file ml2_conf_sriov.ini so we need to add it to its systemd service in /usr/lib/systemd/system/neutron-server.service:

[Service]
Type=notify
User=neutron
ExecStart=/usr/bin/neutron-server --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini --config-file /etc/neutron/plugins/ml2/ml2_conf_sriov.ini  --log-file /var/log/neutron/server.log

And after that restart the service:

# systemctl restart neutron-server

2. Nova scheduler

We need to tell the Nova scheduler about the SR-IOV so that it can schedule instances to compute nodes with SR-IOV support.

In the [DEFAULT] section of /etc/nova/nova.conf adding the PciPassthroughFilter. Also ensure scheduler_available_filters is set as follows:

[DEFAULT]
scheduler_available_filters=nova.scheduler.filters.all_filters
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,CoreFilter,PciPassthroughFilter

And restart Nova scheduler:

# systemctl restart openstack-nova-scheduler

3. Nova compute

Nova compute needs to know which PFs can be used for SR-IOV so that VFs are exposed – actually via PCI-passthrough – to the instances. Also, it needs to know that when we create a network with Neutron specifying the physical network physnet_sriov  – configured before in Neutron with network_vlan_ranges – it will use the SR-IOV NIC.

That’s done by the config flag pci_passthrough_whitelist in /etc/nova/nova.conf:

pci_passthrough_whitelist = {"devname": "p5p1", "physical_network": "physnet_sriov"}

And simply restart Nova compute:

# systemctl restart openstack-nova-compute

4. SR-IOV NIC agent

We can optionally configure the SR-IOV NIC agent to manage the admin state of the NICs. When a VF NIC is used by an instance and then released, sometimes the NIC goes into DOWN state and the admin manually has to bring it back to UP state. There’s an article that describes how to do this in the official Red Hat documentation:

Enable the OpenStack Networking SR-IOV agent

Not all the drivers work with the agent and that was the case for the Intel X540-AT2 NIC.

Creating OpenStack instances with a SR-IOV port

1. Create the network

We configured the physnet_sriov network in Neutron to use the SR-IOV interface p5p1. Let’s create the network and its subnet in Neutron now:

$ neutron net-create nfv_sriov --shared --provider:network_type vlan --provider:physical_network physnet_sriov
$ neutron subnet-create --name nfv_subnet_sriov --disable-dhcp --allocation-pool start=10.0.0.2,end=10.0.0.100 nfv_sriov 10.0.0.0/24

Remember we configured a VLAN range, so Neutron will choose a VLAN from it, but if we wanted to specify one we can by using –provider:segmentation_id=1010 when creating the network.

2. Create the port

We’ll pass a port to the instance instead of the nfv_sriov network. To create it we do this:

$ neutron port-create nfv_sriov --name sriov-port --binding:vnic-type direct

Save the ID of the port as we’ll need it for creating the instance.

3. Create the instance

We will now create an instance that uses two NICs, one created the standard way – in a private network which already existed in Neutron – and a another one with the port created before. Assuming  SRIOV_PORT_ID is the ID of the port and PRIVATE_NETWORK_ID is the ID of the pre-existing private network, this is how we create it:

$ openstack server create --flavor m1.small --nic port-id=$SRIOV_PORT_ID --nic net-id=$PRIVATE_NETWORK_ID --image centos7 sr-iov-instance1

If you have key-pairs or other options  you use, pass them too in the openstack server create command.

Log in the instance as usual and you’ll notice two interfaces, eth0 and probably ens5, which is the SR-IOV NIC ready to be used.

Note as well that one of the VFs has now the same MAC address than the Neutron port we created above:

$ ip link show p5p1
8: p5p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether a0:36:9f:8b:cd:80 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
    vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
    vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
    vf 4 MAC fa:16:3e:e0:3f:be, spoof checking on, link-state auto

PCI-Passthrough

If our VNF (or any virtualized application for that matter) required direct access to a PCI interface in the hypervisor, the PCI-Passthrough functionality in Libvirt/KVM and OpenStack allows us doing it. This is also common in High Performance Computing (HPC), not only with NIC interfaces but, for example, sharing GPUs with the instances.

In this example we’ll use another NIC interface to pass it to the instance: p5p2 in the hypervisor.

Linux configuration for PCI-Passthrough

First, just like before, make sure the admin state if the interface is UP so let’s do the same:

# ip link show p5p2
# ip link set p5p2 up

And in /etc/sysconfig/network-scripts/ifcfg-p5p2:

BOOTPROTO=none
DEVICE=p5p2
ONBOOT=yes

The kernel options are the same ones we used above so nothing else is required at this point.

OpenStack configuration for PCI-Passthrough

Nova scheduler is already configured for PCI-Passthrough so only Nova compute needs to be made aware of the device we want to pass through.

1. Nova compute

We need a second entry in /etc/nova/nova.conf with pci_passthrough_whitelist. This will tell Nova compute that the interface p5p2 can be taken from the Linux OS and passed into an instance:

pci_passthrough_whitelist={ "devname": "p5p2" }

Now, we need to tag this interface with a name that will be used by Nova during the creation of the instance. For example we can call it my_PF. This is also done in the /etc/nova/nova.conf file:

pci_alias={ "vendor_id": "8086", "product_id": "1528", "name": "my_PF"}

Note that the vendor and product IDs are the same ones than before as both NICs are the same. Again, you can get your PCI device IDs with lspci -nn.

2. Nova flavor

The way OpenStack has been designed to allow passing PCI devices to instances is via flavors. The tag we used before (my_PF) needs to be associated with a new flavor in this way:

$ openstack flavor create --ram 4096 --disk 100 --vcpus 2 m1.medium.pci_passthrough
$ openstack flavor set --property "pci_passthrough:alias"="my_PF:1" m1.medium.pci_passthrough

3. Create the instance

Now all we need to do is launching an instance using this new flavor and it will automatically be configured by Nova compute – and then by Libvirt – with the PCI device in it.

$ openstack server create --flavor m1.medium.pci_passthrough --nic net-id=$PRIVATE_NETWORK_ID --image centos7 pci-passthrough-instance1

Again, if you have more options you need such as key-pairs or adding later a floating IP to access the instance you can do it too.

After that, the instance will show again an interface ens5 which is the p5p2 interface. In addition the interface p5p2 will disappear from the operating system while the instance exists.

This post originally ran on Ramon Acedo's blog Tricky Cloud.Acedo is a cloud architect who started in the open source world in the age of the 33.6K modems. He currently works at Red Hat helping businesses in their journey to an enterprise-class OpenStack experience. You shouldfollow him on Twitter.

OpenStack for NFV applications: enabling Single Root I/O virtualization and PCI-Passthrough的更多相关文章

  1. Carrier-Grade Mirantis OpenStack (the Mirantis NFV Initiative), Part 1: Single Root I/O Virtualization (SR-IOV)

    The Mirantis NFV initiative aims to create an NFV ecosystem for OpenStack, with validated  hardware ...

  2. 转载:Why using Single Root I/O Virtualization (SR-IOV) can help improve I/O performance and Reduce Costs

    Introduction While server virtualization is being widely deployed in an effort to reduce costs and o ...

  3. NFV实验平台

    NFV架构如下图所示. NFVI对应于数据平面,数据平面转发数据并提供用于运行网络服务的资源. MANO对应于控制平面,该控制平面负责构建各种VNF之间的连接以及编排NFVI中的资源. VNF层对应于 ...

  4. openstack系列文章(一)

    学习openstack的系列文章-虚拟化 虚拟化 KVM CPU 虚拟化 KVM 内存虚拟化 全虚拟化 I/O 设备 半虚拟化 I/O 设备 I/O PCI PCIe 设备直接分配 SR-IOV 在 ...

  5. Neutron中的网络I/O虚拟化

    为了提升网络I/O性能.虚拟化的网络I/O模型也在不断的演化: 1,全虚拟化网卡(emulation).如VMware中的E1000用来仿真intel 82545千兆网卡,它的功能更完备,如相比一些半 ...

  6. Dynamic device virtualization

    A system and method for providing dynamic device virtualization is herein disclosed. According to on ...

  7. DPDK support for vhost-user

    转载:http://blog.csdn.net/quqi99/article/details/47321023 X86体系早期没有在硬件设计上对虚拟化提供支持,因此虚拟化完全通过软件实现.一个典型的做 ...

  8. CNA, FCoE, TOE, RDMA, iWARP, iSCSI等概念及 Chelsio T5 产品介绍 转载

    CNA, FCoE, TOE, RDMA, iWARP, iSCSI等概念及 Chelsio T5 产品介绍 2016年09月01日 13:56:30 疯子19911109 阅读数:4823 标签:  ...

  9. Documentation/PCI/pci-iov-howto.txt

    Chinese translated version of Documentation/PCI/pci-iov-howto.txt If you have any comment or update ...

随机推荐

  1. EasyNVR现场部署搭配EasyNVS云端集中控制应用于幼儿园直播场景的最佳方案!

    在之前的介绍中,我们已经介绍了很多EasyNVR成功应用于幼儿园类教育直播的场景,例如<EasyDarwin幼教云视频平台在幼教平台领域大放异彩!>.<基于EasyDarwin云视频 ...

  2. 深入理解javascript原型和闭包(17)——补充:上下文环境和作用域的关系

    摘自:http://www.cnblogs.com/wangfupeng1988/p/4000798.html:作者:王福朋: 本系列用了大量的篇幅讲解了上下文环境和作用域,有些人反映这两个是一回儿事 ...

  3. <2013 12 17> 雅思写作、口语相关

    这一个多月,参加了两次雅思考试,成绩分别为: Overall:6.5     L:7.0     R:7.5     W:6.0     S:5.5 Overall:7.0     L:7.0     ...

  4. JLable设置复制粘贴

    final JLabel keyLable = new JLabel(key); keyLable.addMouseListener(new MouseAdapter() { @Override pu ...

  5. python——re模块(正则表达)

    python——re模块(正则表达) 两个比较不错的正则帖子: http://blog.csdn.net/riba2534/article/details/54288552 http://blog.c ...

  6. python四个带 key 参数的函数(max、min、map、filter)

    四个带 key 参数的函数: max()点击查看详细 min()点击查看详细 map()点击查看详细 filter()点击查看详细 1)max(iterable, key) key:相当于对可迭代对象 ...

  7. Latex技巧:LaTex插图命令includegraphics参数详解

    Latex插图的命令是\includegraphics[选项]{文件} 这里的选项在表 7.1, 7.2, 7.3 中列出. 因为 \includegraphics 不会结束 当前段落,所以它能够在文 ...

  8. Maven学习笔记—仓库

    Maven仓库 1 什么是Maven仓库 在Maven中,任何一个依赖.插件或者项目构建的输出,都可以成为构件,而Maven通常在某个位置统一的存储所有Maven项目共享的构件,这个统一的位置就是Ma ...

  9. split命令

    语法:split [OPTION]... [INPUT [PREFIX]]常用参数说明: -a, --suffix-length=N            generate suffixes of l ...

  10. hadoop学习第四天-Writable和WritableComparable序列化接口的使用&&MapReduce中传递javaBean的简单例子

    一. 为什么javaBean要继承Writable和WritableComparable接口? 1. 如果一个javaBean想要作为MapReduce的key或者value,就一定要实现序列化,因为 ...