Revisiting Xen’s x86 Emulation: Xen XSA 123

In my last blog post, I gave an overview about recent vulnerabilities discovered in the x86 emulation layer of Xen. While both of the discussed vulnerabilities only allow for guest privilege escalation, the complexity of the involved code seemed to indicate that even more interesting bugs could be discovered. So I spent some time searching for memory corruption issues and discovered a very interesting bug that resulted in XSA 123 . This post gives an overview about the root cause of the bug and a short description of exploitation challenges. A follow-up post will describe possible exploitation strategies in more detail.

A core piece of Xen’s emulation code is the operand structure shown below:

/* Type, address-of, and value of an instruction's operand. */
struct operand {
    enum { OP_REG, OP_MEM, OP_IMM, OP_NONE } type;
    unsigned int bytes;
    /* Up to 128-byte operand value, addressable as ulong or uint32_t[]. */
    union {
        unsigned long val;
        uint32_t bigval[4];
    };
    /* Up to 128-byte operand value, addressable as ulong or uint32_t[]. */
    union {
        unsigned long orig_val;
        uint32_t orig_bigval[4];
    };
    union {
        /* OP_REG: Pointer to register field. */
        unsigned long *reg;
        /* OP_MEM: Segment and offset. */
        struct {
            enum x86_segment seg;
            unsigned long    off;
        } mem; }; };

The two interesting operand types are registers and memory operands. For register operands, the type field equals OP_REG and reg points to a field in the cpu_user_regs structure, which is the host based representation of all registers of a guest system. For memory operands, type equals OP_MEM and the mem structure is filled with a x86 segment, as well as the memory offset. Of course, this offset is only valid in the context of the guest system and read or writes can not operate directly on this address. When a normal instruction is emulated, the original value of an operand is stored in the val field (orig_val is only used in some special cases). All calculations involving the operand are performed with this value and if the operand is a destination, the end result is written back to the cpu_user_reg field or the guest memory space.
Because the reg and mem fields are stored as a union, special care has to be taken that the type of the operand is handled correctly. Most often this is done by simply checking for the type before performing a relevant operation as can be seen below:

if ( dst.type == OP_REG )
   dst.val = *dst.reg;
else if ( (rc = read_ulong(dst.mem.seg, dst.mem.off, 
          &dst.val, 2, ctxt, ops)) )
   goto done;

In other cases, the check is performed implicit due to opcode specific operand conversions or even by checking the value of the ModRM byte directly:

       fail_if(modrm >= 0xc0);
            ea.bytes = 4;
            src = ea;
            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
                                 src.bytes, ctxt)) != 0 )
                goto done;

This seemed to be quite error prone, so I walked through all accesses to one of the union fields and tried to verify the type invariant. While most occurrences are correct, the handler code for segment overrides does not perform the needed check:

 if ( override_seg != -1 )
        ea.mem.seg = override_seg;

ea is an instance of the discussed operand structure that was created by parsing the ModRM byte of the instruction. override_seg is the value of a segment override, that is used to specify the segment to use for memory accesses. Setting a segment override is implemented using special instruction prefixes (0x2e,0x3e,0x26,0x,64,0×65,0x36) and the Xen code responsible for handling these prefixes is:

 /* Prefix bytes. */
    for ( ; ; )
    {
        switch ( b = insn_fetch_type(uint8_t) )
        {
        ..
        case 0x2e: /* CS override */
            override_seg = x86_seg_cs;
            break;
        case 0x3e: /* DS override */
            override_seg = x86_seg_ds;
            break;
        case 0x26: /* ES override */
            override_seg = x86_seg_es;
            break;
        case 0x64: /* FS override */
            override_seg = x86_seg_fs;
            break;
        case 0x65: /* GS override */
            override_seg = x86_seg_gs;
            break;
        case 0x36: /* SS override */
            override_seg = x86_seg_ss;
            break;
        ....
        default:
            goto done_prefixes; }

So setting override_seg is as easy as adding an override prefix to an arbitrary instruction. Because no check is performed to ensure that ea actually corresponds to a memory access and not to a CPU register, we can corrupt parts of the ea.reg pointer.The bug can be easily triggered by forcing the emulation of an instruction like “2e 01 d8 0a” (cs: add eax, ebx) as discussed in my last post.

While this unchecked write to the ea.mem.seg field is definitely a bug, assessing the security impact is more complex.

There are two main questions we have to answer:

What is the corrupted value of ea.reg and can we control it?
Is ea.reg used for any controllable memory reads or writes?

The second question is easy: Because ea.reg normally points to the memory address of the virtual register value, we can read values by emulating an instruction that uses the operand as a source and we can write arbitrary values by emulating a instruction that uses it as a destination. This means that we can read and write arbitrary value at the memory address the corrupted ea.reg field is pointing to.

Regarding the first question about the ea.reg value: Due to the layout of C unions in memory, ea.mem.seg shares its address with the lowest 32bits of ea.reg. Because we can only assign enum values between 0 and 5 to override_seg, the range of accessible addresses is quite limited. For example, if ea.reg originally points to an address like 0xffff84XXXXXXXXXX, then 0xffff840000000-0xffff840000005 can be corrupted. This means that the impact of this bug strongly depends on the Xen memory layout, as well as the original value of ea.reg, which in turn depends on the location of the used cpu_user_regs structure in memory.

In practice, the cpu_user_regs structure used in the x86 emulator is stored at the top of the current CPU stack. The exact address depends on the number of cores, the amount of physical memory in the system and which core actually executes the emulation code. The most interesting case involves multi core systems with at least 4GB of physical memory:

cpu_user_regs for CPU cores other than CPU0 will be stored somewhere in the address range of 0xffff830YXXXXXXXX which means we are able to corrupt a number of bytes higher than 0xffff830Y00000000. This address range is part of the 1:1 mapping of all physical memory kept by the Xen hypervisor which means we can manipulate the content of certain physical memory addresses. Of course these addresses can contain data of all running VMs, as well as the hypervisor itself, allowing for a potential guest breakout to occur.

While this behavior is hard to exploit in general, successful exploitation and the compromise of another VM is possible under certain circumstances. I will discuss possible exploitation strategies at our “Exploiting Hypervisor” workshop at Troopers and Syscan and plan to release a detailed blog post about this in the following weeks.

See you in Heidelberg or Singapore 🙂

– Felix

Comments