Installing Gentoo inside Gentoo, Securing your KVM, and PCI Passthrough

Anyone who knows me knows that my daily-driver is Gentoo Linux. The freedom permitted by baking your own kernel and compiling these packages from source has become indispensible for me, especially as a student in electrical engineering who needed to plug in various microcontroller programmers, debuggers, and more dongles than I cared for into my computer.

Recently I reappropriated an old computer of mine and, just for fun, decided to investigate the hype around this “HyperVisor” thing I’d heard about; I’d used a virtual machine before, but had no experience actually using one outside kernel debugging. Rather than use a type-1 hypervisor (e.g. Hyper-V, VMWare ESXi) I decided on using a type-2, hosted hypervisor using the KVM Linux module, chiefly because I wanted more control over my network interfaces and HDD than afforded by bare-metal.

The specs of the computer I used are below:

After planning my route of attack, here’s the hypervisor scheme I developed:

QEMU Plan

From the host (Gentoo) I would set partition the remaining 480GiB for the virtual machines; because I wanted to take full advantage of the available RAM as well as PCI passthrough the GPU, I decided on a policy of “only one VM on at a time,” allowing me to sidestep a lot of the shared-resource problems plaguing typical VM setups such as those acting as web-servers while also maximizing RAM and CPU utilization.

Once the host OS was installed, I re-built the kernel with virtualization support and made notes of which IOMMU groups to passthrough. With basic passthrough you must pass through an entire IOMMU group at a time, there is no ability to pass through only one device of a group (without a set of ACS override patches, which seem sketchy to get working at best).

Each VM gets its own slice of the host SSD. There are also two “common” drives (one for *nix, one for Windows) which are intended for the storage of games, documents and downloads, thereby increasing disk utilization by decreasing data redundancy. These disks are encrypted using LUKS (and unlocked at startup using my PARANOiA cryptosystem and mounted as volume groups, then passed through as “raw disks” to the VM, so even though the VM thinks it is writing to “raw” disk it is really writing to this encrypted volume group.

VirtIO Everything

There are a few options for providing disk and network access to VMs but the fastest and most effective way (by far) is to expose them as VirtIO devices. This means that, instead of emulating a “physical disk” or “physical ethernet port”, the VM communicates with the host through a virtual driver which is much faster at writing to the device than passing through that extra layer of emulation.

Essentially, use VirtIO for everything unless you don’t need to. You’ll notice a significant speed boost, as things will be writing at just below the bare-metal speed (e.g. my SSD feels just as fast on a VM as on the host with the VirtIO drivers enabled. For disks, the QEMU line will look something like:

-drive file=/dev/mapper/gentoo,format=raw,if=virtio,aio=native,cache.direct=on

… if you’re using a scheme similar to mine. Notice that the drive is actually a file mapped by LUKS, because the volume groups are encrypted before they are mounted as disks in the VM. If you’re looking to use VirtIO for your network devices, the line looks like:

-device virtio-net-pci,netdev=n0
-netdev tap,ifname=tap0,script=no,downscript=no,id=n0

… for a TAP device. Change the -netdev line if you’re not using TAP (of course).

Remember that you will need a set of VirtIO drivers installed on the VM in order to communicate with these devices. Windows in particular will need some extra effort to get VirtIO working with disks so make sure to download the Windows drivers from Fedora.

IOMMU / PCI Passthrough

PCI devices are enumerated through various IOMMU groups on your motherboard; each device, bus, and card belongs to an IOMMU group that you cannot change because it is part of the motherboard’s architecture.

-device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1
-device piix4-ide,bus=pcie.0
-device vfio-pci,host=01:00.0,x-vga=on,bus=root.1,addr=00.0,multifunction=on
-device vfio-pci,host=01:00.1,bus=pcie.0
-device vfio-pci,host=00:1b.0,bus=pcie.0

The above lines pass through all three of the devices in IOMMU group 1 devices for my motherboard. You can list your IOMMU groups by the following:

for d in /sys/kernel/iommu_groups/*/devices/*; do
	n=${d#*/iommu_groups/*}
	n=${n%%/*}; printf 'IOMMU Group %s ' "$n"; lspci -nns "${d##*/}"
done

One of the cool things you can do with GPU passthrough is to use the VGA line to run your KVM host from an onboard video chip into a monitor, then run DVI / HDMI cables from your GPU into the same monitor to provide video to your VM. If everything goes well, you should be able to switch input modes on your monitor to switch from VM back to your host; I’ve rigged my Gentoo KVM to kill the monitor signal once I boot in to the VM so I don’t need to toggle them manually.

QEMU GUI

QEMU doesn’t have a great GUI so I whipped one together using dialog and Bash:

gui

Not really that pretty but it gets the job done.

Current Issues

Sometimes my keyboard is “stuck” in the VM and does not pass back to the host when I shutdown the VM; this is a minor inconvenience and happens only rarely and can be rememdied by unplugging the keyboard momentarily. I’d like to avoid this, however, and am looking for a solution (this likely involves talking nicely with QEMU and consulting dmesg).

Otherwise it’s a very comfy and snappy set-up.