Home

Projects I love: XEPA

Intro

XEPA is an idea to use a Linux live environment with Kernel-based Virtual Machine (KVM) + PCI passthrough as a sort of exoskeleton around a BMC-less bare metal chassis to provide an install environment, where an "install environment" basically means installing from ISO via Keyboard+Video+Mouse.

XEPA is a combination of small magic tricks put together in order to pull of a big magic trick. While each one of the small magic tricks is special in it's own way, the gratification of being able to install any OS to a cloud hosted bare metal server half the world away is very neat.

Credit up top, XEPA is the work of Enkel Prifti, all authorship and credit goes there, I just think it's so cool I wanna talk about it too.

Problem Scenario Setup

If the reader has ever managed bare metal hardware, the workflow of installing an OS to a server by logging into a chassis's (e.g Dell R6515) Baseboard Management Controller (BMC) (e.g iDRAC) and using the BMC to "mount" and boot into an ISO through a virtual CD-ROM and using a browser or Java based virtual Keyboard+Video+Mouse to navigate through an installer should be familiar.

At the end of some finagling and clicking, the box reboots, it gets configured to be alive on the network, at which point the primary management function moves to something like RDP, SSH, or automation.

If this sounds familiar to you, you've probably done it a bunch. This is such a common workflow that it is often “assumed” to be the workflow for installing to Bare Metal.

Problem Scenario: Equinix Metal

Equinix Metal is a platform that provides single-tenant bare metal servers in a cloudy / elastic and on-demand way. If you think of a Droplet or EC2 instance but instead of being delivered some kind of virtual compute host, you get a real server chassis, that is Equinix Metal.

One of the things that naturally attracts people to Metal is that at the end of the day, it's just servers, the same kinds of servers we've been building things with for decades, they just happen to be delivered in a magically cloud way.

If you have a reason why your workload is better aligned with real hardware compute than hyper scale virtual compute, Metal can be a "best of many worlds" solution for lots of painful challenges.

Where the problem first starts is that many people who want just servers, also want to install their OS. If you care enough to want real hardware, there is a good chance you care about your OS for that hardware.

While Equinix Metal provides real hardware as cloudy instances, Metal removes customer access to the chassis BMC. Put another way, customers do not have network or local access to the BMC of a provisioned Metal instance. This is for a variety of reasons, most obviously security.

What this means is that when customers go to install their OS, they very often expect to be able to install it via BMC+Keyboard+Video+Mouse, only to find that path is unavailable.

How then, is a customer supposed to install their own OS to a Metal instance?

The answer already exists: iPXE or "Custom iPXE"

iPXE is a neat little toolchain for booting and lifecycling servers over a network. It's a simple project with a healthy ecosystem, and is broadly adopted across the "people who boot computers" space, for example it's featured heavily in OpenStack, OpenShift and you can even burn ROM's of it to USB or NIC's themselves.

Most prominently, Equinix Metal uses iPXE as part of it's "Custom iPXE" feature, which allows customers to control and boot into an iPXE context which then allows them to manage and own their own OS installation over the network.

So problem solves right? Customers will want to be able to install their own OS to Metal and "Custom iPXE" provides for that right?

Unfortunately not always.

Problems with Custom iPXE

Two key things are missing from the Equinix Metal iPXE context that are often needed by customer operators:

  1. "Keyboard + Video + Mouse" I/O

While Equinix Metal has the incredibly cool Serial over Console (SOS) feature, the way it works is by taking the servers BMC "virtual serial console" out feature and making it available via a nifty SSH endpoint.

Because it is based on a BMC emulated virtual serial console, that means the OS context inside the server must be configured to send and receive I/O from the second "local serial port".

For appliance operating systems or operating systems with graphical installers or interfaces, this can be a deal breaker as lots of installers (even some OS's) have terrible or non-existent primitives around sending user I/O to serial.

  1. Virtual CD-ROM

One of the niftier functions of a modern BMC is the ability to "mount" an ISO and expose it as a virtual CD-ROM to the chassis itself. The problem is that it is such a nifty feature and it's usage is so common, that lots of ISO packaged installers still expect a CD "present" to the chassis as a fully emulated CD-ROM.

One of the truly annoying behaviors of many lazy installers is to "self mount" a CD-ROM from inside the installers LiveOS context. Put another way, after the server loads the CD-ROM and boots into the installer environment on it, the installer environment will often loopback mount itself inside the LiveOS environment. When that happens, the installer will go looking for it's CD-ROM, and if it doesn't find one, bomb out.

In an iPXE context, that CD-ROM doesn't exist. In a regular desktop computer, the CD-ROM would be physically present with the CD-ROM spinning inside it. In a regular server / iDRAC environment, the virtual CD-ROM from the BMC is presenting the ISO as being the spinning disk inside the virtual CD-ROM.

In an iPXE context, there is nothing available to present a disk inside a CD-ROM. iPXE reboots are complete context trashes, there is no way to preserve state or emulation between iPXE contexts.


So even though iPXE can "boot" an ISO installer, the installer itself may not be able to mount itself inside of it's environment the way it wants as if it was a CD-ROM, and we also likely wouldn't have the Keyboard / Video / Mouse I/O required to navigate many installers.

So whats is do to?

XEPA to the rescue

XEPA takes advantage of an underappreciated Equinix Metal feature called Rescue Mode which reboots a Metal instance into a LiveOS Alpine environment (fun fact, it does this via more clever internal use of iPXE).

The Rescue Mode environment comes with things like networking and SSH-keys, and because it's a LiveOS, it can be a useful swiss-army knife kind of tool for working with Metal instances.

What XEPA does inside that live environment is:

This way, when the VM is started, the installer is seeing a "real" CD-ROM with the ISO loaded as the "spinning disk", the installer is seeing the real physical hardware (NIC), the VM is providing Keyboard + Video + Mouse (accessed via VNC) and the installer is writing to the real physical disk of the server.

The installer will functionally believe it is seeing and interacting with the hardware of the Bare Metal server, despite the fact that it is really running on a VM on a Hypervisor hosted by that Bare Metal server.

Pretty cool right?

The installer does its thing, write its data to the real disk and then exits / reboots or powers off the VM. The VM can then be hard powered off, and the physical server rebooted.

When the server reboots, it will look at its local disk, find the Operating System that was installed via the previous self-hosted VM, boot into that context and keep right on trucking.

Ok but what does this really do?

It provides a path to customers being able to install just about whatever OS they want to the disk of a bare metal server without any privileged access to the rest of the platform or the BMC.

Super cool.