16.12.2019

Virtio Block Driver

68
  • The Block Limits problem with virtio-scsi. The Block Limits VPD page is optional – although many SCSI devices chooses to support it, its absence doesn’t hurt the SCSI specification. This has a direct impact in a QEMU guest that uses SCSI pass-through with virtio-scsi.
  • This repository contains KVM/QEMU Windows guest drivers, for both paravirtual and emulated hardware. The code builds and ships as part of the virtio-win RPM on Fedora and Red Hat Enterprise Linux, and the binaries are also available in the form of distribution-neutral ISO and VFD images.

Article

Oct 09, 2018  The Block Limits problem with virtio-scsi. The Block Limits VPD page is optional – although many SCSI devices chooses to support it, its absence doesn’t hurt the SCSI specification. This has a direct impact in a QEMU guest that uses SCSI pass-through with virtio-scsi. Feb 12, 2010  Effect of using Virtio block drivers in KVM. The two things that surprised me in doing these tests is just how horrible the Virtio-blk driver does when using Qcow2 format and just how freaking.

Allowing QEMU to set transfer limits for all SCSI pass-through devices

Introduction

Cloud providers must support a great variety of hardware to support the customer needs. This includes hardware that might not behave as the back-end virtualization technology expects it to, sometimes leading to problems in the created virtual machine (VM) or guest.

QEMU allows the guests to use any Small Computer System Interface (SCSI) storage of the host by using two distinct paravirtualization backends that provides what it is called SCSI pass-through. The first one is virtio-blk. In this mode, available in all hypervisors, the guest uses the real storage device as a block device just to read and write data, while QEMU emulates the SCSI device internally. When running in Linux hypervisors, QEMU offers a second back end called virtio-scsi. Using this back end, QEMU proxies the SCSI communication back and forth between the guest device and the physical device, allowing the guest device to use all advanced features that the real device might implement. The gains of using virtio-scsi comes at a cost: if the physical SCSI device misbehaves or QEMU does not handle it properly, the guest is directly affected.

There is an instance in which virtio-scsi does not handle the physical SCSI device gracefully, impacting the guest. During the device setup, QEMU uses an optional SCSI feature called vital product data (VPD) Block Limits to inform the guest of the transfer limits of the physical device. Because this feature is optional, the device might not implement it. In this case, QEMU does not provide an alternative way to pass the device transfer limits to the guest, which ends up setting the transfer limit to a default value.

If this default value conflicts with the actual transfer limits of the real device, the guest is unable to use it. It will send read/write commands that are larger than what the physical device could handle, causing SCSI errors in the hypervisor that will propagate back to the guest.

In this article, we’ll talk more about this QEMU virtio-scsi behavior in these scenarios and how the author was able to solve it. The following section covers some basics of the SCSI standard that is necessary to understand the problem domain. The “QEMU virtio-blk/virtio-scsi with SCSI pass-through” section elaborates on how QEMU and virtio-scsi taps into the SCSI communication between the guest kernel and the physical device to allow the guest to properly configure it. “The Block Limits problem with virtio-scsi” section explains the problem and the possible workarounds for it. In “A solution using emulation” section we detail how the author solved the problem by using the existing emulation of QEMU virtio-blk back end.

QEMU and virtio-scsi with SCSI pass-through

SCSI is a series of standards that defines commands, protocols, and interfaces to connect and transfer data between computers and peripherals. For the purpose of this article, we’re going to detail the Inquiry command only.

we’ll also demonstrate the Inquiry command in practice with examples using the sg3_utils toolkit. This is a common package in most of the Linux distributions that allows the user to send and receive SCSI messages from user space to the device. In Fedora Linux, this can be installed with:

Virtio Block Driver Download

SCSI Inquiry command

The Inquiry command is used to request information from the SCSI device about its capabilities. On Linux, the kernel sends an Inquiry command for each detected SCSI device to query its attributes and set up them during the boot process. Foxit advanced pdf editor crack.

Figure 1. Inquiry request format

The enable vital product data (EVPD) bit determines whether this is an Inquiry for a specific VPD page or a standard Inquiry.

If EVPD is set to zero, then a standard Inquiry data is expected. The standard Inquiry data contains information such as vendor, peripherical device type, product identification, serial number, and so on.

Let’s use sg_inq, a command from sg3_utils, to issue a standard Inquiry to a real device. Let us consider a system with the following SCSI devices:

We have a single Serial Advanced Technology Attachment (SATA) drive at /dev/sda. You can use the sq_inq command to send a standard Inquiry to it:

As we can see, general information and capabilities about the device is retrieved in the response.

If the EVPD bit of the Inquiry command is set to one, then the PAGE_CODE byte contains the requested VPD page. A common use is to first request the supported VPD page (page code 00h) to determine what VPD pages the device supports. Then, issue an Inquiry request for each one of them.

In the same device we used before, we can retrieve the supported VPD pages data by using the sq_inq command with extra parameters:

The --vpd parameter sets the EVPD bit to one, the --page parameter allows us to specify the VPD page we want.

Instead of using sq_inq and setting the EVPD manually, we can use another sg3_utils utility called sg_vpd that sets it automatically for us. It also has the advantage of being able to decode all VPD pages with one command.

Using sg_vpd to retrieve the ATA information page:

There are several VPD pages defined in the SCSI spec, but two of them are mandatory: page 00h (supported VPD pages) and page 83h (device identification). Any device that complies with the spec must implement at least these two VPD pages. Most devices implement more than these two pages, especially the unit serial number (80h).

QEMU virtio-blk/virtio-scsi with SCSI pass-through

Virtio Block Driver Windows 7

To provide a virtual SCSI device, QEMU will either emulate the SCSI target and just read/write in the physical block device (virtio-blk back end), or it will pass-through the SCSI messages from the virtual machine kernel directly to the real device (virtio-scsi back end). The usability problem that we want to discuss happened with virtio-scsi, thus let’s elaborate on the basic functioning of it. But to get started, we’ll briefly explain how the virtio-blk back end works for two reasons: it makes easier to understand virtio-scsi and the mechanics of virtio-blk was part of the solution we’ll discuss in “A solution using emulation” section.

virtio-blk

Simply put, this is a fully emulated back end as far as SCSI communication is concerned. The figure below illustrates how it operates.

Figure 2: SCSI communication with virtio-blk

All SCSI commands responses are emulated in QEMU. For the read/write SCSI commands, QEMU will read/write the contents in the device block file in the hypervisor, emulating the SCSI reply back to the guest kernel.

we’ll now use the environment in which the virtio-scsi bug was found and fixed, which is an IBM POWER9 processor-based server with a 1.8 terabyte (TB) LSI MegaRAID 9361-8i storage. The QEMU command line to use a SCSI device using virtio-blk is:

When using Libvirt, this is the element that must be added in the guest XML:

The /dev/disk/by/id/scsi-<id>(…) path points to a SCSI storage in the host.

And, refer to the following code for its SCSI Inquiry details.

However, inside the guest operational system, the device identifies itself as QEMU HARDDISK.

This shows that QEMU emulates the SCSI layer for this device, presenting it as a QEMU hard disk. While emulating, QEMU will take into account, among other things, the current configuration of the SCSI device in the host. This ensures that the configuration of the emulated device that the guest will use is compatible with the configuration of the real device. One of those configurations is related to the max_sectors_kb Linux kernel parameter that QEMU set in the Block Limits VPD response.

max_sectors_kb and Block Limits VPD

Virtio Block Driver Mac

max_sectors_kb is described in the kernel documentation [3] as:

“This is the maximum number of kilobytes that the block layer will allowfor a filesystem request. Must be smaller than or equal to the maximumsize allowed by the hardware.”

The value can be retrieved by reading sysfs. In the case of the MegaRAID device in the host, the value is:

This means that any process running in the host can’t send any read or write request that exceeds 256 kilobytes to the /dev/sdb device.

Because QEMU is a process in the host, this limitation also applies to the virtual machine that uses /dev/sdb with SCSI pass-through. If the virtual machine attempts to use a greater value, QEMU won’t be able to read/write the host block file, that is, the guest SCSI disk won’t be usable.

The max_sectors_kb value of the host is retrieved by QEMU using an input/output control (or ioctl) called BLKSECTGET. This ioctl receives a valid file descriptor and a pointer to an integer in which the result will be restored. For example:

This command fetches the max_sectors_kb value of the block device that the file descriptor (fd) uses and stores it in the max_sectors variable.

This value is then added to the SCSI Block Limits VPD response. Block Limits is an optional VPD page that provides operating parameters such as Maximum/Optimal Transfer Length, Prefetch Length, and others. If the SCSI device supports it, the kernel requests the Block Limits page to set up the device parameters. The max_sectors_kb parameter is related to the Maximum Transfer Length value of the Block Limits response.

Inside the guest, let’s use sg_vpd to see the supported VPD pages for the /dev/sdb emulated pass-through device:

Virtio Block Driver Free

And retrieve the reply for the Block Limits page:

Maximum transfer length value is set to 512 blocks. This is no accident – QEMU read the host max_sectors_kb and found it to be 256 kilobytes. One block is 512 bytes, so 512 blocks equals 256 kilobytes. This means that, from the guest point of view, the SCSI device is reporting a maximum capability that matches the max_sectors_kb setting it has on the host.

And this allows the guest to set up the max_sectors_kb value of the pass-through device:

virtio-scsi

The virtio-scsi back end allows the guest to directly send SCSI requests back to the real device. Its functioning is shown in the following figure.

Figure 3: SCSI communication with virtio-scsi

All SCSI commands responses are sent by the real device, passing through QEMU. This mechanism allows the guest device to use all the features that the real device implements. Read and write requests from the guest are also sent directly to the real device.

Using the same environment from the virtio-blk example, the QEMU command line to use virtio-scsi is similar, but scsi-hd is changed to scsi-block”:

Using Libvirt, comparing with the virtio-blk example, change device=’disk’ to device=’lun’ to use virtio-scsi:

Inside the guest, issuing an Inquiry request to /dev/sdb using sg_inq gives us the information about the real device:

The available VPD pages of the virtual device matches the pages that the real hardware supports:

Aside from these differences, QEMU does the same setup strategy with the max_sectors_kb parameter described in “max_sectors_kb and Block Limits VPD” section when using virtio-scsi: QEMU intercepts the Block Limits VPD response from the real device that is addressed to the guest, changes the Maximum Transfer Length field, and then forwards it to the guest. Note that, in this case, this mechanism is bounded to the support of the Block Limits VPD page by the SCSI device in the host (which brings us to the problem that we want to discuss).

The Block Limits problem with virtio-scsi

The Block Limits VPD page is optional – although many SCSI devices chooses to support it, its absence doesn’t hurt the SCSI specification.

This has a direct impact in a QEMU guest that uses SCSI pass-through with virtio-scsi. If the SCSI device does not support it, there will be no Block Limits VPD message between the guest and the SCSI device. Without this message, there is no way to let the guest know of the max_sectors_kb setting of the host. This means that the guest will take a default value for the parameter, which can be incompatible with the host side parameter, causing the guest device to malfunction.

Using the setup from “virtio-scsi” section, we can see that there is no Block Limits support for the virtio-scsi device of the guest:

This above output shows that there is no Block Limits support, meaning that the guest configured max_sectors_kb with a default value, as given below:

In this case, it is a value greater than the one in the host side:

What will happen is that the guest SCSI device will send requests bigger than what the host can handle, that is, it can’t be used to read or write.

Performing a read test with dd:

Performing a write test with dd:

If the guest was able to boot up to the prompt, there is a way to work around this issue.

Windows Feb 22, 2016  Download System Center Management Pack for Windows 10 Operating System from Official Microsoft Download Center. Surface Book 2. Powerhouse performance in the ultimate laptop. 10 Monitoring Management Pack is built to detect, diagnose, and resolve hardware and software problems pertaining to Windows 10 operating system. Jun 19, 2019  Here's what it takes to upgrade to Windows 10 on your PC or tablet: Latest OS: Make sure you're running the latest version—either Windows 7 SP1 or Windows.

Workaround

To work around the SCSI sense error, set the max_sectors_kb parameter in the guest operating system to match the value that the device has on the host operating system. You can perform one of the following:

Run the echo commandSet the value in the /sys/block/ directory. If the max_sectors_kb parameter in the host operating system is 256, set it to the same value in the guest operating system.

This process can be automated to persist guest restart. An alternative method is to add the echo command in the /etc/rc.local file in the guest operating system. In this example, this is done for a /dev/sda SCSI device in the guest operating system.

You can also use udev rules to achieve this. However, adding the echo command in the /etc/rc.local file is an alternative method.

Set the value in libvirtYou can set the value of the max_sectors_kb parameter directly in the libvirt XML file, forcing the whole SCSI bus to not surpass the value you want:

This impacts all the SCSI devices that use this controller. You can use this approach when you want to install a new operating system that uses a SCSI pass-through disk that is affected by this issue.

To echo the right value to the /sys/block/ directory during a guest install operation, you must access a system terminal during the installation process and change the value of the max_sectors_kb parameter before the installation starts to write in the disk. Hence, you can set the value in libvirt. If the guest operating system is already installed, the approach described in the first method is less restrictive because it does not affect other devices.

Note that there will be times (a guest installation uses the virtio-scsi device and there is no way to set the parameter beforehand) that even these workarounds won’t suffice. In this case, the user would need to either remove the virtio-scsi disks or use virtio-blk instead during the install process.

A solution using emulation

The workarounds for this virtio-scsi problem have the following drawbacks that can’t be easily ignored:

Driver
  • It can be troublesome if the guest has many pass-through devices that needs this tuning
  • If a change in max_sectors_kb is made in the host side, manual change in the guests will also be required.
  • During an OS installation, it is difficult (and sometimes not possible) to go to a terminal and change the max_sectors_kb parameter prior to the installation.

A better way would be to fix this situation from inside QEMU. The author proposed a fix that relies on the already available emulation from virtio-blk and adjustments in the virtio-scsi back end, allowing the guest to query for the Block Limits page even when the SCSI hardware doesn’t support it. we’ll go through the concepts of the developed solution now.

Guest must always ask for the Block Limits VPD page

To fix the max_sectors_kb issue with virtio-scsi, using the existing mechanism described in “max_sectors_kb and Block Limits VPD” section, the guest must always query for the Block Limits page regardless of the hardware supporting it or not. There is no way to make the guest aware of the proper setting otherwise.

However, all SCSI messages are proxy to the real SCSI hardware, which isn’t aware of what QEMU wants to accomplish. In a reply to an Inquiry message fetching the available pages, the real device will only advertise the Block Limits page if it supports it.

As seen in “SCSI Inquiry command” section, to query all available pages, an Inquiry message with the EVPD bit set is sent to the device. The format of the reply to this Inquiry request is shown in Figure 4.

Figure 4: Supported VPD format

This is a variable size message where byte 3 is the length of the supported page list, which starts at byte 4. Considering our last example:

Virtio block driver windows 7

Virtio Block Driver Download

Refer to the following hexadecimal format of the response:

Byte 3 indicates that the length of the page list is 3 bytes. Byte 4 up to 6 contains the list, which is 00 (supported VPD pages), 80 (unit serial number) and 83 (device identification).

This response passes through QEMU untouched and the guest will never ask for the Block Limits page. But we want the guest to ask for the Block Limits page even if the hardware doesn’t support it.

But, because we know how the guest will interpret it, we are able to change the response before it is delivered, adding the Block Limits in the page list. For each Inquiry with EVPD set response QEMU receives, check if the page b0h (Block Limits) is in the page list that is returned. If it is present, there is nothing to be done. Otherwise, we’ll add *b0 at the end of the page list and increment the page length information (byte 3).

* In this case the problem doesn’t occur – the hardware has Block Limits support and everything work as described in “max_sectors_kb and Block Limits VPD” section.

Refer to the following C code snippet that represents the idea.Considering that buf is a byte buffer with a SCSI message response:

Figure 5: Adding Block Limits support to the Inquiry for supported VPD pages reply

Doing this change, the guest will be aware of Block Limits support, but the max_sectors_kb value is still wrong if the SCSI device does not support it. The guest will send Block Limits requests to the device and will get an error. We can see this behavior by fetching the available VPD pages and trying to get the Block Limits information inside the guest.

This is expected and will be handled by emulating the Block Limits VPD response.

Emulate the Block Limits response if necessary

Virtio Block Device Driver

In section “virtio-blk“, we saw that the virtio-blk back end emulates all the SCSI replies that are sent back to the guest kernel. We also verified that QEMU implements the Block Limits VPD page in this case. This means that we already have code inside QEMU that can be used to solve the problem in virtio-scsi. If the guest sends a Block Limits response and an error is returned from the hardware, we can deliver an emulated Block Limits reply from the virtio-blk code, which has the max_sectors_kb parameter already considered, and the guest can properly set up the device.

Figure 6: Emulating Block Limits page if the hardware does not support it

Note that this will only happen if the guest knows about the Block Limits support, meaning that we’ll need to make sure that we advertise it all the time using the code we discussed earlier.

The author took a step further in the final version of the fix that was accepted in QEMU. Instead of checking every Inquiry EVPD message, QEMU will fire an Inquiry supported VPD page request to the device right after the virtual machine starts. If the SCSI device does not advertise Block Limits support, an internal scsi-block flag called needs_vpd_emulation is set. This flag is then checked every time an Inquiry reply or a SCSI error comes from the hardware to QEMU to see if this is a case of either changing the Inquiry reply or emulating the Block Limits page. A QEMU guest may have several scsi-block devices at the same time, and this flag allows a single verification at machine start for each scsi-block device instead of doing it for every Inquiry response or SCSI error. (Refer VPD Block Limits emulation implementation for more details.) Figure 7 illustrates all messages and events related to the fix that is available publicly in QEMU.

Figure 7: Design of the max_sectors_kb fix for virtio-scsi in QEMU

Conclusion

The max_sectors_kb issue found and fixed in QEMU is an example of how flexible and robust virtualization technologies must be to support a great array of hardware, aiming to provide the best service available to customers.

The idea of using the Maximum Transfer Length field to insert the max_sectors_kb value of the Linux host andallowing the guest to properly set up the SCSI device is ingenious. But, this couldn’t fix all cases because it was reliant of hardware support for an optional VPD page, something that we can’t take for granted.

The work reported in this article, covering this corner case, makes the QEMU virtio-scsi layer more robust and convenient for users of SCSI pass-through devices.

Before I start:
suggestions welcome. I expect my testing procedures to be flawed and I am looking for better benchmarks.
-----------------------------
The goal of this exercise was to learn to what extent the use of paravirtualized drivers had on I/O performance.
background:
Modern versions of Qemu/KVM are able to use the Virtio driver infrastructure. This provides a way for different forms of virtualization (although I think that KVM/Qemu is the only user) to do paravirtualized (PV) drivers. PV is a general technique were you are willing to modify the guest somewhat in order to gain better performance. In the case of PV drivers they able to provide optimized access to various VM features by loading VM-aware system drivers into the OS.
Virtio is able to provide optimized PCI busses, different memory management features (such as the ability to 'recover' memory from guests to the host) and different things like that. The most interesting to me is PV drivers that provide optimized I/O access. The cost of having fully emulated hardware is especially noticeable when doing networking and accessing harddrives.
Now the impact on networking is just very obvious. Timing simple file transfers over ethernet easily show the differences in performance between a emulated Intel gigabit ethernet card and the Virtio network driver.
However block devices are not so obvious.. So I wanted to do some simple benchmarks using bonnie++ to see how much difference there really is.
Most modern Linux versions should have the full suite of virtio drivers built-in. They have been part of the standard linus kernel tree for a while now. Windows has virtio-net for 2000+ and viostar for XP+ (for accessing a virtio drive). and they work surprisingly well, actually.
-----------------------------
Here is the setup:
Relevant Hardware:
Dell Optiplex 720
Intel 2 Duo E8500@3.16GHz
Intel 82801JD/DO (ICH10 Family) SATA AHCI
Wester Digital 500G drive model: WDC WD5000AAKS-00V1A0
Host Software Configuration:
Debian Unstable
2.6.32-trunk-amd64 kernel booted with 'mem=768M'*
Qemu-kvm debian package version: 0.11.1+dfsg-1
Virt-manager + Libvirt was used to create and manage the guest VM.
*that was meant to simulate a system were most of the RAM was allocated to other VMs which would be typical.
Relevant Storage:
The 500G drive is partitioned into a single partition (sda1). SDA1 is then used as the sole PV in VG using LVM2. From that 4 LVs are created. 3 5GB LVs and one 15GB LV.
The 3 5GB are then used as drives for the Guest VM. A IDE drive, a SCSI drive, and a Virtio drive.
Then the 1 15GB LV is formated as ext3 and mounted to the Host directory system. On that mount was created 3 Qcow2 drive images (Qemu's native drive format). One for IDE, another for SCSI, and the last one for a Virtio drive. Besides the 3 qcow2 image drives this was the volume were the 'Native' benchmark took place.
Guest Configuration:
KVM was used for the VM.
Standard configuration for 32bit Debian was used.
Guest OS was Debian Unstable with 2.6.32-trunk-686
Allocated memory was 512MB
Benchmark:
Very simple benchmark. Each of the 6 drives were then mounted simultaneously and a script to run the benchmark was ran. That was run overnight. Then the VM was shutdown and the benchmark was ran on the system. Once that was done then the system was rebooted with the Virtio LV backed drive configured with 'cache=none' configuration and bonnie++ was ran on that.
Bonnie ran 4 tests each time is was ran on each of the volumes and then I imported them into OO.org Calc, and averaged the results. I used those results to generate the graphs below.
The bonnie++ command line was(basically):
bonnie++ -s 3500 -m test-name -x 4 -u user > bonnie.log
As you see I did not get fancy. I would like to find a better benchmark tool or configuration. Any suggestions?
-------------------------------
Results:
Sorry the colors don't match up for all the graphs. OO.org is irritating enough as it is and this is not really something meant to be anything but rather informal. All I wanted to do was just provide the relevant information people can use to know what is going on and I hope somebody finds it informative.

r5gnd.netlify.com – 2018