TAG | Virtualization
In the course of a current virtualization research project, I was reviewing a lot of documentation on hypervisor security. While “hypervisor security” is a very wide field, hypervisor breakouts are usually one of the most (intensely) discussed topics. I don’t want to go down the road of rating the risk of hypervisor breakouts and giving appropriate recommendations (even though we do this on a regular base which, surprisingly often, leads to almost religious debates. I know I say this way too often:I’ll cover this topic in a future post ), but share a few observations of analyzing well-known examples of vulnerabilities that led to guest-to-host-escape scenarios. The following table provides an overview of the vulnerabilities in question:
Just a quick update here: Ivan (who gave the magnificent Virtual Firewalls talk at Troopers recently) blogged about this and some guy added some feedback from an environment with Cisco FEX and “one of the server guys start[ing] a Citrix Netscaler” . See the second comment to his post.
This shows, once more, that the dependencies of various technologies (and what they are used for) must be well understood in cloud/virtualized environments. Complexity … but who do we tell. Y’ all know that, right?
In our last series of posts regarding the VMDK file inclusion attack, we focused on read access and prerequisites for the attack, but avoided stating too much about potential write access. But as we promised to cover write access in the course of our future research, the following post will describe our latest research results.
First of all, the same prerequisites (which will be refined a little bit more later on) as for read access must be fulfilled and the same steps have to be performed in order to carry out the attack successfully. If that is the case, there are several POIs (Partitions Of Interest) on a ESXi hypervisor that are interesting to include:
- /bootbank → contains several archives which build the hypervisors filesystem once they are unpacked
- /altbootbank → backup of the prior version of bootbank, e.g. copied before a firmware update
- /scratch → mainly log files and core dumps stored here
- /vmfs/volumes/datastoreX → storage for virtual machine files
The root filesystem is stored on a ramdisk which is populated at boot time using the archives stored in the bootbank partition. As this means that there is no actual root partition (since it is generated dynamically at boot time and there is no such thing like a device descriptor for the ramdisk), this excludes the root file system from our attack tree, at least at first sight.
While trying to write to different partitions, we noticed that the writing sometimes fails. Evaluating the reason for the failure, we also noticed that this is only the case for certain partitions, such as /scratch. After monitoring the specific write process, we noticed the following errors:
attx kernel: [93.238762] sd 0:0:0:0: [sda] Unhandled sense code attx kernel: [93.238767] sd 0:0:0:0: [sda] Result: hostbyte=invalid driverbyte=DRIVER_SENSE attx kernel: [93.238771] sd 0:0:0:0: [sda] Sense Key : Data Protect [current] attx kernel: [93.238776] sd 0:0:0:0: [sda] Add. Sense: No additional sense information attx kernel: [93.238780] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 02 04 bd 20 00 00 08 00 attx kernel: [93.238790] end_request: critical target error, dev sda, sector 33864992 attx kernel: [93.239029] Buffer I/O error on device sda, logical block 4233124 attx kernel: [93.239181] lost page write due to I/O error on sda
cpu0:2157)WARNING: NMP: nmpDeviceTaskMgmt:2210:Attempt to issue lun reset on device naa.600508b1001ca97740cc02561658c136. This will clear any SCSI-2 reservations on the device. cpu0:2157)<4>hpsa 0000:05:00.0: resetting device 6:0:0:1 cpu0:2157)<4>hpsa 0000:05:00.0: device is ready
Assuming that the hypervisor hard drive is not broken for exactly the cases we try to write from within a guest machine, we performed further tests (using various bash scripts and endless writing loops) and found out that this error occurs when the hypervisor and a guest machine are trying to write at the same time to the same device. Due to the fact that the hypervisor continuously writes log files to the /scratch file system and generates all kind of I/O to the /vmfs data stores (due to the running virtual machines stored on that partition), it was not possible to write to those devices in a reliable way.
Fortunately (at least from an attacker’s point of view ) the remaining partitions, /bootbank and /altbootbank, are only accessed at boot time and it is hence possible to write to those partitions in a reliable way. At that point, the initially mentioned fact about the lacking root partition gets important again: As the root partition would be the first and most promising target of write access, it would most probably also be locked by the hypervisor as there might also be different files that would be written on periodically. So when we were searching for a way to write to the dynamically generated root partition, we came up with the following steps:
- Include the device holding the /bootbank partition.
- Write to the /bootbank partition.
- Wait for the hypervisor to reboot (or perform the potential DoS attack we identified, which will be described in a future post).
- Enjoy the files from the /bootbank partition being populated to the dynamically created root partition.
The last step is of particular relevance. /bootbank holds several files that contain archives of system-critical files and directories of the ESXi hypervisor. For example the /bootbank/s.v00 contains an archive of the directories and files, including parts of /etc. The hypervisor restores the particular directories (such as /etc) at startup from the files stored in /bootbank. As we are able to write files to /bootbank, it is possible to replace contents of /bootbank/s.v00 and thus contents of /etc of the hypervisor ramdisk. In order to make sure that certain files in /etc are replaced, we can access the file /bootbank/boot.cfg which holds a list of archives which get extracted at boot time. As we have all necessary information to write to the root partition of the hypervisor, these are the steps to be performed:
- Obtain /bootbank archive, in this example /bootbank/s.v00, using the well-known attack vector.
- Convert/extract archive: The archives in /bootbank are packed with a special version of tar which is incompatible with the GNU tar. However this vmtar version can be ported to a GNU/Linux by copying the vmtar binary and libvmlibs.so from any ESXi installation.
- Modify or add files.
- Repack the archive.
- Deploy the modified archive to /bootbank using the write access.
Following this generic process, we were able to install a backdoor on our ESXi5 hypervisor. In a first step, we opened a port in the ESXi firewall (which has a drop-all policy) as we wanted to deploy a bind shell (even though we could have used a connect-back shell instead, but we also want to demonstrate that is possible to modify system-critical settings). The firewall is configured by xml files stored in /etc/vmware/firewall. These xml files are built as follows:
<ConfigRoot> <service id='0000'> <id>sshServer</id> <rule id='0000'> <direction>inbound</direction> <protocol>tcp</protocol> <porttype>dst</porttype> <port>22</port> </rule> <enabled>false</enabled> <required>false</required> </service> [...] </ConfigRoot>
The xml format is kind of self-explanatory. Every service has a unique identifier id, and can have inbound and outbound rules. To enable the rule on system boot the enabled field has to be set to true.
Based on this schema it is easy to deploy a new firewall rule. Simply place a new xml file to /etc/vmware/firewall in the archive which will be written to the bootbank later on.
For example to open port 42000:
<ConfigRoot> <service id="0000"> <id>remote Bind Shell</id> <rule id="0000"> <direction>inbound</direction> <protocol>tcp</protocol> <porttype>dst</porttype> <port>4444</port> </rule> <enabled>true</enabled> <required>false</required> </service> </ConfigRoot>
This ruleset will be applied the next time the hypervisor reboots, after overwriting one of the archives in bootbank with our altered one.
The next step is to bind a shell to the opened port. Unfortunately the netcat installed per default on the hypervisor is not capable of the “-e” option, which executes a command after an established connection. The most basic netcat bind shell just listens on a port and forwards all input to the binary specified by the -e switch:
netcat -l -p 4444 -e /bin/sh.
Luckily the netcat version of 32bit BackTrack distribution is compatible with the ESXi platform and supports the -e switch. After copying this binary to the hypervisor, we just need to make sure that the backdoor is started during the boot process by adding the following line to /etc/rc.local:
/etc/netcat –l –p 4444 –e /bin/sh &
Next time the hypervisor starts, a remote shell will listen on port 4444 with root privileges. The following steps summarize the process of fully compromising the ESXi hypervisor:
- Include the /bootbank partition using our well-known attack path
- Unpack /bootbank/s.v00
- Add our bind shell port to the firewall by adding the described file to /path/to/extracted/etc/vmware/firewall
- Add the netcat binary to /path/to/extracted/etc/nc
- Add the above line to /path/to/extracted/etc/rc.local
- Wait for the next reboot of the hypervisor (or our post on the potential DoS )
At the end of the day, this means that once attacker is able to upload VMDK files to an ESXi environment (in a way that fulfills the stated requirements), it is possible to alter the configuration of the underlying hypervisor and even to install a backdoor which grants command line access.
Pascal & Matthias
As we are receiving a lot of questions about our VMDK has left the building post, we’re compiling this FAQ post — which will be updated as our research goes on.
How does the attack essentially work?
By bringing a specially crafted VMDK file into a VMware ESXi based virtualization environment. The specific attack path is described here.
What is a VMDK file?
A combination of two different types of VMDK files, the plain-text descriptor file containing meta data and the actual binary disk file, describes a VMware virtual hard disk. A detailed description can be found here.
Are the other similar file formats used in virtualization environments?
Yes, for example the following ones:
- VDI (used by e.g. Xen, VirtualBox)
- VHD (used by e.g. HyperV, VirtualBox)
- QCOW (used by e.g. KVM)
Are those vulnerable too?
We don’t know yet and are working on it.
Which part of VMDKs files is responsible for the attack/exposure?
The so-called descriptor file, describing attributes and structure of the virtual disk (See here for a detailed description).
How is this to be modified for a successful attack?
The descriptor file contains paths to filenames which, combined, resemble the actual disk. This path must be modified so that a file on the hypervisor is included (See here for a detailed description).
How would you call this type of attack?
In reference to web hacking vulnerabilities, we would call it a local file inclusion attack.
What is, in your opinion, the root cause for this vulnerability?
Insufficient input validation at both cloud providers and the ESXi hypervisor, and a, from our point of view, misunderstanding of trust boundaries, such as that one should “not import virtual machines from untrusted sources”.
Does this type of attack work in all VMware ESX/vSphere environments?
Basically, the ESXi5 and ESXi4 hypervisor are vulnerable to the described attack as of June 2012. Still, the actual exploitability depends on several additional factors described here.
Can this type of attack be performed if there’s no VMDK upload capability?
Which are typical methods of uploading VMDK files in (public) cloud environments?
E.g. Web-Interface, FTP, API, …
Which are typical methods of uploading VMDK files in corporate environments?
In addition to the mentioned ones, direct deployment to storage, vCloudDirector, …
Will sanitizing the VMDK (descriptor file) mitigate the vulnerability?
From our perspective this should not be too difficult to implement. There are basically two steps:
- Striping leading directory paths/relative paths from the path to be included
- Restricting included files to customer-owned directories
However a certain knowledge about the specific storage/deployment architecture is necessary in order to sanitize the VMDK descriptor file and not break functionality.
Will VMware patch this vulnerability?
Probably yes. They might do so “silently” though (that is without explicitly mentioning it in an associated VMSA) as they have done in the past for other severe vulnerabilities (e.g. for this one).
Could you please describe the full attack path?
All steps are described here.
More details can be found in a whitepaper to be published soon. Furthermore we will provide a demo with a simplified cloud provider like lab (including, amongst others, an FTP interface to upload files and a web interface to start machines) at upcoming conferences.
Do you need system/root access to the hypervisor in order to successfully carry out the attack?
No. All necessary information can be gathered during the attack.
What is the potential impact of a successful attack?
Read access to the physical hard drives of the hypervisor and thus access to all data/virtual machines on the hypervisor. We’re still researching on the write access.
Which platforms are vulnerable?
As of our current state of research, we can perform the complete attack path exclusively against the ESXi5 and ESXi4 hypervisors.
In case vCloud Director is used for customer access, are these platforms still vulnerable?
To our current knowledge, no. But our research on that is still in progress.
Are OVF uploads/other virtual disk formats vulnerable?
Our research on OVF is still in progress. At the moment, we cannot make a substantiated statement about that.
Is AWS/$MAJOR_CLOUD_PROVIDER vulnerable?
Since we did not perform any in the wild testing, we don’t know this yet. However, we have been contacted by cloud providers in order to discuss the described attack.
Given AWS does not run VMware anyways they will most probably not be vulnerable.
Is it necessary to start the virtual machine in a special way/using a special/uncommon API?
Which VMware products are affected?
At the moment, we can only confirm the vulnerability for the ESXi5 and ESXi4 hypervisors. Still, our research is going on
Matthias and Pascal
VMDK Has Left the Building — Some Nasty Attacks Against VMware vSphere 5 Based Cloud Infrastructures
14 Comments | Posted by Matthias Luft
Update #1: Slides are available for download here.
In the course of our ongoing cloud security research, we’re continuously thinking about potential attack vectors against public cloud infrastructures. Approaching this enumeration from an external customer’s (speak: attacker’s ) perspective, there are the following possibilities to communicate with and thus send malicious input to typical cloud infrastructures:
- Management interfaces
- Guest/hypervisor interaction
- Network communication
- File uploads
As there are already several successful exploits against management interfaces (e.g. here and here) and guest/hypervisor interaction (see for example this one; yes, this is the funny one with that ridiculous recommendation “Do not allow untrusted users access to your virtual machines.” ), we’re focusing on the upload of files to cloud infrastructures in this post. According to our experience with major Infrastructure-as-a-Service (IaaS) cloud providers, the most relevant file upload possibility is the deployment of already existing virtual machines to the provided cloud infrastructure. However, since a quick additional research shows that most of those allow the upload of VMware-based virtual machines and, to the best of our knowledge, the VMware virtualization file format was not analyzed as for potential vulnerabilities yet, we want to provide an analysis of the relevant file types and present resulting attack vectors.
As there are a lot of VMware related file types, a typical virtual machine upload functionality comprises at least two file types:
The VMX file is the configuration file for the characteristics of the virtual machine, such as included devices, names, or network interfaces. VMDK files specify the hard disk of a virtual machine and mainly contain two types of files: The descriptor file, which describes the specific setup of the actual disk file, and several disk files containing the actual file system for the virtual machine. The following listing shows a sample VMDK descriptor file:
# Disk DescriptorFile
# Extent description
RW 33554432 VMFS "machine-flat01.vmdk"
RW 33554432 VMFS "machine-flat02.vmdk"
For this post, it is of particular importance that the inclusion of the actual disk file containing the raw device data allows the inclusion of multiple files or devices (in the listing, the so-called “Extent description”). The deployment of these files into a (public) cloud/virtualized environment can be broken down into several steps:
- Upload to the cloud environment: e.g. by using FTP, web interfaces, $WEB_SERVICE_API (such as the Amazon SOAP API, which admittedly does not allow the upload of virtual machines at the moment).
- Move to the data store: The uploaded virtual machine must be moved to the data store, which is typically some kind of back end storage system/SAN where shares can be attached to hypervisors and guests.
- Deployment on the hypervisor (“starting the virtual machine”): This can include an additional step of “cloning” the virtual machine from the back end storage system to local hypervisor hard drives.
To analyze this process more thoroughly, we built a small lab based on VMware vSphere 5 including
- an ESXi5 hypervisor,
- NFS-based storage, and
everything fully patched as of 2012/05/24. The deployment process we used was based on common practices we know from different customer projects: The virtual machine was copied to the storage, which is accessible from the hypervisor, and was deployed on the ESXi5 using the vmware-cmd utility utilizing the VMware API. Thinking about actual attacks in this environment, two main approaches come to mind:
- Fuzzing attacks: Given ERNW’s long tradition in the area of fuzzing, this seems to be a viable option. Still this is not in scope of this post, but we’ll lay out some things tomorrow in our workshop at #HITB2012AMS.
- File Inclusion Attacks.
Focusing on the latter, the descriptor file (see above) contains several fields which are worth a closer look. Even though the specification of the VMDK descriptor file will not be discussed here in detail, the most important field for this post is the so-called Extent Description. The extent descriptions basically contain paths to the actual raw disk files containing the file system of the virtual machine and were included in the listing above.
The most obvious idea is to change the path to the actual disk file to another path, somewhere in the ESX file system, like the good ol’ /etc/passwd:
# Disk DescriptorFile
# Extent description
RW 33554432 VMFS "machine-flat01.vmdk"
RW 0 VMFSRAW "/etc/passwd"
Unfortunately, this does not seem to work and results in an error message as the next screenshot shows:
As we are highly convinced that a healthy dose of perseverance (not to say stubbornness ) is part of any hackers/pentesters attributes, we gave it several other tries. As the file to be included was a raw disk file, we focused on files in binary formats. After some enumeration, we were actually able to include gzip-compressed log files. Since we are now able to access files included in the VMDK files inside the guest virtual machine, this must be clearly stated: We have/can get access to the log files of the ESX hypervisor by deploying a guest virtual machine – a very nice first step! Including further compressed log files, we also included the /bootbank/state.tgz file. This file contains a complete backup of the /etc directory of the hypervisor, including e.g. /etc/shadow – once again, this inclusion was possible from a GUEST machine! As the following screenshot visualizes, the necessary steps to include files from the ESXi5 host include the creation of a loopback device which points to the actual file location (since it is part of the overall VMDK file) and extracting the contents of this loopback device:
The screenshot also shows how it is possible to access information which is clearly belonging to the ESXi5 host from within the guest system. Even though this allows a whole bunch of possible attacks, coming back to the original inclusion of raw disk files, the physical hard drives of the hypervisor qualify as a very interesting target. A look at the device files of the hypervisor (see next screenshot) reveals that the device names are generated in a not-easily-guessable-way:
Using this knowledge we gathered from the hypervisor (this is heavily noted at this point, we’re relying on knowledge that we gathered from our administrative hypervisor access), it was also possible to include the physical hard drives of the hypervisor. Even though we needed additional knowledge for this inclusion, the sheer fact that it is possible for a GUEST virtual machine to access the physical hard drives of the hypervisor is a pretty big deal! As you still might have our stubbornness in mind, it is obvious that we needed to make this inclusion work without knowledge about the hypervisor. Thus let’s provide you with a way to access to any data in a vSphere based cloud environment without further knowledge:
- Ensure that the following requirements are met:
- ESXi5 hypervisor in use (we’re still researching how to port these vulnerabilities to ESX4)
- Deployment of externally provided (in our case, speak: malicious ) VMDK files is possible
- The cloud provider performs the deployment using the VMware API (e.g. in combination with external storage, which is, as laid out above, a common practice) without further sanitization/input validation/VMDK rewriting.
It must be noted that the hypervisor hard drives contain the so-called VMFS, which cannot be easily mounted within e.g. a Linux guest machines, but it can be parsed for data, accessed using VMware specific tools, or exported to be mounted on another hypervisor under our own administrative control.
Summarizing the most relevant and devastating message in short:
VMware vSphere 5 based IaaS cloud environments potentially contain possibilities to access other customers’ data…
We’ll conduct some “testing in the field” in the upcoming weeks and get back to you with the results in a whitepaper to be found on this blog. In any case this type of attacks might provide yet-another path for accessing other tenants’ data in multi-tenant environments, even though more research work is needed here. If you have the opportunity you might join our workshop at #HITB2012AMS.
Have a great day,
Pascal, Enno, Matthias, Daniel
0 Comments | Posted by Florian Horsch
Just wanted to let you know that we sent out ERNW Newsletter 32 end of last week. As we promised it includes the results of research regarding the question “Is browser virtualization a valid security control in order to mitigate browser based security risks?”.
Simon did a great job with writing the latest newsletter. It’s a 30-page document which should help you to have a basis for well-informed decisions when it comes to the deployment of an application virtualization technology.
5 Comments | Posted by Enno Rey
One of the biggest pains in the ass of most ISOs – and subsequently subject of fierce debates between business and infosec – is the topic of “Browser Security”, i.e. essentially the question “How to protect the organization from malicious code brought into the environment by users surfing the Internet?”.
Commonly the chain of events (of a typical malware infection act) can be broken down to the following steps:
1.) Some code – no matter if binary or script code – gets transferred (mostly: downloaded) to some system “from the Internet”, that means “over the network”.
2.) This code is executed by some local piece of software (where “execution” might just mean “parse a PDF” .
[btw, if you missed it: after Black Hat Adobe announced an out-of-band patch scheduled for 08/16, so stay tuned for another Adobe Reader patch cycle next week...]
3.) This code causes harm (either on it’s own, either by reloaded payloads) to the local system, to the network the system resides in or to other networks.
Discussing potential security controls can be centered around these steps, so we have
a) The area of network based controls, that means all sorts of “malicious content protection” devices like proxies filtering (mainly HTTP and FTP) traffic based on signatures, URL blacklists etc., and/or network based intrusion prevention systems (IPSs).
Practically all organizations use some of this stuff (however quite a number of them – unfortunately – merely banks on these pieces). Let me state this clearly: overall using network based (filtering) controls contributes significantly to “overall protection from browser based threats” and we won’t discuss the advantages/disadvantages of this approach right here+now.
Still it should be noted that this is what we call a “detective/reactive control”, as it relies on somehow detecting the threat and scrubbing it after the detection act).
b) Controls in the “limit the capability to execute potentially harmful code” space. Which can be broken down to things like
- minimizing the attack surface (e.g. by not running Flash, iTunes etc. at all). The regular readers of this blog certainly knows our stance as for this approach .
- configuration tweaks to limit the script execution capabilities of some components involved, like all the stuff to be found in IE’s zone model and associated configuration options (see this document for a detailed discussion of this approach).
- patching (the OS, the browser, the “multimedia extensions” like Flash and Quicktime, the PDF reader etc.) to prevent some “programmatic abuse” of the respective components.
Again, we won’t dive into an exhaustive discussion of the advantages/disadvantages of this approach right here+now.
c) Procedures or technologies striving to limit the harm in case an exploit happens “in browser space” (which, as of our definition, encompasses all add-ons like Flash, Quicktime etc.). This includes DEP, IE protected mode, sandboxing browsers etc.
Given the weaknesses the network based control approach might have (in particular in times of targeted attacks. oops, sorry, of course I mean: in times of the Advanced Persistent Threat [TM] ) and the inability (or reluctance?) to tackle the problem on the “code execution” front-line in some environments, in the interim another potential control has gained momentum in “progressive infosec circles”: using virtualization technologies to isolate the browser from the (“core”) OS, other applications or just the filesystem.
Three main variants come to mind here: full OS virtualization techniques (represented, for example, by Oracle VirtualBox or VMware Workstation), application virtualization solutions (like Microsoft App-V or VMware ThinApp) and, thirdly, what I call “hosted browsing” (where some MS Terminal Server farm potentially located in a DMZ, or even “the cloud” may serve as “[browser] hosting infrastructure”).
In general, on an architecture level this is a simple application of the principle of “isolation” – and I really promise to discuss that set of architectural security principles we use at ERNW at some point in this blog .
While I know that some of you, dear readers, use virtualization technologies to “browse safely” on a daily (but individual use) basis, there’s still some obstacles for large scale use of this approach, like how to store/transfer or print documents, how to integrate client certificates – in particular when on smart cards – into these scenarios, how to handle “aspects of persistence” (keeping cookies, bookmarks vs. not keeping potentially infected “browser session state”) etc.
And, even if all these problems can be solved, the big question would be: does it help, security-wise? Or, in infosec terms: to what degree is the risk landscape changed if such an approach would be used to tackle the “Browser Security Problem”?
To contribute to this discussion we’ve performed some tests with an application virtualization solution (VMware ThinApp) recently. The goal of the tests was to determine if exploits can be stopped from causing harm if they happened within a virtualized deployment, which modes of deployment to use, which additional tweaks to apply etc.
The results can be found in our next newsletter to be published at the end of this week. This post’s purpose was to provide some structure as for “securing the browser” approaches. and to remind you that – in the end of the day – each potential security control must be evaluated from two main angles: “What’s the associated business impact and operational effort?” and “How much does it mitigate risk[s]?”.
Have a great day,
0 Comments | Posted by Enno Rey
Today was an interesting day, for a number of reasons. Amongst those it stuck out that we were approached by two very large environments (both > 50K employees) to provide security review/advise, as they want to “virtualize their DMZs, by means of VMware ESX”.
[yes, more correctly I could/should have written: "virtualize some of their DMZ segments". but this essentially means: "mostly all of their DMZs" in 6-12 months. and "their DMZ backend systems together with some internal servers" in 12-24 months. and "all of this" in 24-36 months. so it's the same discussion anyway, just on a shifted timescale ]
Out of some whim, I’d like to give a spontaneous response here (to the underlying question, which is: “is it a good idea to do this?”).
At first, for those of you who are working as ISOs, a word of warning. Some of you, dear readers, might recall the slide of my Day-Con3 keynote titled “Don’t go into fights you can’t win”.
[I'm just informed that those slides are not yet online. they will be soon... in the interim, to get an idea: the keynote's title was "Tools of the Trade for Modern (C)ISOs" and it had a section "Communication & Tools" in it, with that mentioned slide].
This is one of the fights you (as an ISO) can’t win. Business/IT infrastructure/whoever_brought_this_on_the_table will. Get over it. The only thing you can do is “limit your losses” (more on this in second, or in another post).
Before, you are certainly eager to know: “now, what’s your answer to the question [good idea or not]?”.
I’m tempted to give a simple one here: “it’s all about risk [=> so perform risk analysis]“. This is the one we like to give in most situations (e.g. at conferences) when people expect a simple answer to a complex problem .
However it’s not that easy here. In our daily practice, when calculating risk, we usually work with three parameters (each on a 1 ["very low"] to 5 ["very high" scale), that are: likelihood of some event (threat) occurring, vulnerability (environment disposes of, with regard to that event) and impact (if threat "successfully" happens).
Let's assume the threat is "Compromise of [ESX] host, from attacker on guest”.
Looking at “our scenario” – that is “a number of DMZ systems is virtualized by means of VMware ESX” – the latter one (impact) might be the easiest one: let’s put in a “5″ here. Under the assumption that at least one of the DMZ systems can get compromised by a skilled+motivated attacker at some point of time (if you would not expect this yourselves, why have you placed those systems in a DMZ then? … under that assumption, one might put in a “2″ for the probability/likelihood. Furthermore _we_ think that, in the light of stuff like this and the horrible security history VMware has for mostly all of their main products, it is fair to go with a “3″ for the vulnerability.
This, in turn, gives a “2 * 3 *5 = 30″ for the risk associated with the threat “Compromise of [ESX] host, from attacker on guest” (for a virtualized DMZ scenario, that is running guests with a high exposure to attacks).
In practically all environments performing risk analysis similar to the one described above (in some other post we might sometimes explain our approach – used by many other risk assessment practitioners as well – in more detail), a risk score of “30″ would require some “risk treatment” other than “risk retention” (see ISO 27005 9.3 for our understanding of this term).
Still following the risk treatment options outlined in ISO 27005, there are left:
a) risk avoidance (staying away from the risk-inducing action at all). Well, this is probably not what the above mentioned “project initiator” will like to hear … and, remember: this is a fight you can’t win.
b) risk transfer (hmm… handing your DMZ systems over to some 3rd party to run them virtualized might actually not really decrease the risk of the threat “Compromise of [ESX] host, from attacker on guest”
c) risk reduction. But… so how? There’s not many options or additional/mitigating controls you can bring into this picture. The most important technical recommendation to be given here is the one of binding a dedicated NIC to every virtualized system (you already hear them yelling “why can’t we bring more than ~ 14 systems on a physical platform?”, don’t you? ). Some minor, additional advise will be provided in another post, as will some discussion on the management side/aspects of “DMZ virtualization”. (notice how we’re cleverly trapping you into coming back here?
So, if you are sent back and asked to “provide some mitigating controls”… you simply can’t. there’s not much that can be done. You’re mostly thrown back to that well-known (but not widely accepted) “instrument of security governance”, that is: trust.
In the end of the day you have to trust VMware, or not.
We don’t. We – for us – do not think that VMware ESX is a platform suited for “high secure isolation” (at least not at the moment).
The jury is still out on that one… but presumably you all know the truth, at your very inner self
For completeness’ sake, here’s the general advice we give when we only have 60 seconds to answer the question “What do you think about the security aspects of moving systems to VMware ESX”. It’s split into “MUST” or (“DO NOT”) parts and “SHOULD” parts. See RFC 2119 for more on their meaning. Here we go:
1.) Assuming that you have a data/system/network classification scheme with four levels (like “1 = public” to “4 = strictly confidential/high secure”) you SHOULD NOT virtualize “level 4″. And think twice before virtualizing SOX relevant systems
2.) If you still do this (virtualizing 4s), you MUST NOT mix those with other levels on the same physical platform.
3.) If you mix the other levels, then you SHOULD only mix two levels next to each other (2 & 3 or 1 & 2).
4.) DMZ systems SHOULD NOT be virtualized (on VMware ESX as of the current security state).
5.) If you still do this (virtualizing DMZ systems), you MUST NOT mix those with Non-DMZ systems.
For those of you who have already violated advice no. 4 but – reading this – settle back mumbling “at least we’re following advice no. 5″… wait, my friends, the same people forcing you before will soon knock at your door … and tell you about all those “significant cost savings” again… and again…