The idea of automatically generating signatures for network borne threats isn’t precisely new. In fact the concept dates back to the early 1980′s and predates network-propagated malware attacks by quite a margin.
Back in 1997, network-based intrusion detection systems (IDS) really started to take off as a viable commercial security technology – with ISS rolling out their RealSecure – and the first (internal vendor) prototype systems were developed for automatic signature development. Even in those early days where malware threats were exceedingly noisy and clumsy on the network, it was difficult to develop reliable signatures that wouldn’t let loose a tide of false positive results. Any automatically generated signature needed to be reviewed by skilled engineers and “soak tested” in pliable customer networks – and subsequently tweaked or dropped, depending upon false positives observations.
Fifteen years later new vendors continue to reinvent the wheel and rediscover the pain behind the promise. Take for example Palo Alto Networks’ Wildfire and FireEye’s Virtual Execution Engine approaches (what’s with the “fire” thing?). Both capture suspicious binaries observed traversing their customer’s networks and hand them over to a software virtualization system for controlled execution and (eventual) automatic signature generation. Those signatures (basically a Snort regex-based signature) and then pushed to IDS sensors to alert upon.
What’s the matter with that? Well, for anything beyond the most ubiquitous and dumb malware out there, the approach will either result in false positive alerts or fail to alert altogether. The latter is more common than the former because the systems are tuned that way. After all, no one likes to be woken up late at night due to a false positive – so if you miss the threat entirely, who’s ever going to know?
There are way too many reasons why automated signature systems like FireEye and Wildfire continue to fail in detecting and diagnosing todays targeted threats (note that this isn’t a problem unique to just them – it applies to the other dozen or so near-identical products from other vendors).
You may remember a paper I published mid-2011 on the topic “Automated In-Network Malware Analysis: Why Virtual Machines Can Sputter and Miss“. That paper alone provides numerous examples of what the badguys are doing to bypass automated analysis systems and the perils of not operating analysis platforms that replicate real victim systems (which is an Achilles Heel of these appliance-led “protection” systems).
Out of all the possible ways the badguys can usurp these auto-generated malware detection engines, let’s look at one of the more recent ones – malware that makes use of Domain Generation Algorithms (DGA).
Last week Damballa Labs released a report and case study on the current generation of DGA-based crimeware and you’ll likely have noted that of the 12 newly uncovered DGA’s, 6 of them were associated with advanced (but mainstream) crimeware families. Now here’s the rub, if you’re reliant upon an automated system for malware analysis to generate “protection” signatures for your IDS, I hope you’re enjoying the Emperor’s new clothes because you’re naked to the C&C communications of Shiz, Bimital, BankPatch, Expiro.Z, Bonnana, and recent generations of Zeus (to name but a few).
How is that possible? Simply put, several of the features of DGA’s are designed to intentionally bypass these kinds of dynamic protection technologies. By not using static domain names for their C&C lookups, and by ensuring that their network communications look like standard HTTP/HTTPS traffic, any automatically generated “call back” signature will be based upon redundant and outdated information before it’s even deployed – and subsequently never trigger against real crimeware traffic.
As the crimeware sample executes within the virtualized analysis environment (and assuming that it’s not bothering to conduct any anti-VM evasion in the first place (which is a fallacy of course)) it will initiate its DGA and attempt to look up a number of dynamically calculated domain names. Those domain names and the C&C servers they are hunting for reflect the current date/time of the malware’s execution. At a different date or time additional malware samples will generate their own sets. The net result is that, assuming the vendor’s product can create a signature in the first place, the signature will be for only a single malware sample at a fixed (past) point in time. Adding that signature to the IDS and alerting system can only result in false positive detections – and lost incident response cycles.
With an evasion technique as successful as the modern DGA, it is not a surprise to see it being adopted by additional crimeware families.
There is a technique for identifying DGA’s and the victims of DGA-based crimeware. The dynamism of DGA operational use means that the detection technologies need to more flexible and operate independently of static signatures. As such, the current generation of DGA-based threat detection focuses upon DNS-level observations and utilizes clustering and spectral analysis approaches.