Evaluating Behavior-based Malware Detection

Quite some organizations complemented their traditional AV solutions with a technology that can best be described as behavior-based malware detection. While we all know we are talking about products like Fireeye Email/Network Security, zScaler Web Security/APT Protection, or Cisco WSA, there are a lot of terms around to describe this type of products (such as next generation malware analysis/detection, Secure Web Gateways, or behavior-based malware detection). Those offerings typically promise the detection of malware by analyzing the behavior of ‘samples’ (which are files captured in transit of different types, such as executables or PDF documents). However, beyond the taxonomy challenges, both assessment and consulting work gets us frequently in contact with those solutions. While the main task during assessments is to bypass those solutions, the main question in the consulting context typically is “to what degree are the solutions suited to protect from common targeted attacks in the enterprise context”. Luckily, the experience from assessment work allows us to tackle this question in a structured way (which is our approach for consulting anyways: Benefit from our assessment experiences in order to provide reasonable consulting advice…).

Having approached this question in the past, here are some notes on potential evaluation criteria:

Integration & Expertise: As for every security control, you should ask yourself first: Do we have the operational resources, capabilities, and expertise to actually use the solution in a way that actually results in security benefit? (Even though the question what the expected “security benefit” is needs to be clarified beforehand 😉 ). In addition, there are several more factors such as export options for results, customizable reports, number/type of input sources for samples (see also next item). Of course these heavily depend on the individual environment and the specific implementation one might have in mind.
Data Gathering: All solutions have the capabilities to aggregate different data sources for the actual interpretation of a given sample. Typical data sources are email, web, and general network traffic, so you should evaluate which of those are supported by a product to-be-evaluated and which may require the purchase of additional products of the same line 😉
Analysis Mode: The analysis of samples usually relies — based on the information from vendor whitepapers — on an implementation of VM Introspection to monitor CPU instructions and memory/device access. However, we would be surprised if some solutions do not make use of some analysis methods (e.g. hooking) in the virtual machine. All of the monitored data is then analyzed as for typical OS-level activities (such as various ways to load libraries).
Interpretation: The last but not least important aspect is the interpretation of the gathered data. While there is some documentation on the analysis techniques (such as introspection) available by the main vendors, not a single document describes in detail how a sample is rated to be malicious or not. Having played with different solutions in the past, our impression is that every vendor has self-developed heuristics to recognize attack primitives (such as buffer overflows, ROP chains, loading code from the Internet, persistence mechanisms) and hence detects typical attack scenarios which are just combinations of those primitives. Those heuristics are an important differentiating element between vendors and subsequently need a closer look.

For the moment these aspects are described here in a rather generic way but in our opinion they illustrate the general areas of a thorough evaluation of the described solutions. We will provide more insight into the different evaluation areas in the future!

Have a good one,
Matthias