Wednesday, July 18, 2012

DGA-based Botnet Detection

There are essentially two types of magic in the world – those that deceive an audience in to believing something impossible has occurred, and that which is best defined by Arthur C. Clarke as “any sufficiently advanced technology is indistinguishable from magic”.

The antivirus industry is rife with the former – there’s no shortage of smoke and mirror explanations when it comes to double-talking “dynamically generated signature signature-less engines” and “zero false-positive anomaly detection systems”.

I’m going to introduce you to the later kind of magic. Technological approaches so different from traditional approaches that, for many folks out there in Internet-land, it’s indistinguishable from magic. More than that though, I’m going to try to explain how such techniques are reversing the way in which threat discovery has occurred in the past. However what I’m not going to do is to even try to explain a fraction of the math and analytics that lies behind that magic – at least not in a blog!
Oh where, oh where should we start?

Let’s begin, for arguments sake, by classifying malware as a tool; a weapon to be more precise. In the physical world it would be easy to associate “malware” with the bullets from a gun, and the gun in turn likened to perhaps a drive-by download site or a phishing email. In response to that particular physical threat, there are a number of technological approaches that have been deployed in order to counter the threat – we have metal detectors and x-ray machines to alert us to the presence of guns, sniffing technologies to identify the presence of explosive materials, CCTV and behavioral analysis systems to identify the suspects who may be hiding the gun.

A fundamental premise of this layered detection approach is that we’ve encountered the threat in the past and already classified it as bad – i.e. as a “weapon”. Gun equals bad, knife equals bad, metal corkscrew equals bad, and so on. Meanwhile everything else is assumed to be good – like an ostrich egg – until it happens to be used as a weapon (such as when “Russell pleaded guilty to assault using an ostrich egg as a weapon, assault, and breaching a protection order“) and inevitably some new detection technique is proposed to detect it.

Traditionally the focus has been on “preventing” the threat. In particular, detecting the presence of a known threat and stopping it from reaching its target. In the physical world, in general, the detection technologies are pretty robust – however (and it’s a big “however”), the assumption is that the technology needed to provide this prevention capability is ubiquitous, deployed everywhere it could potentially be needed, and that it works every time. Sure, at high value targets (such as airports) you’ll find such technology employing its optimal capability, elsewhere though (such as the doorway into your home) it’ll not be encountered. There are obvious parallels with the cyber-world here – except arguably the Internet-equivalent technologies are a little more ubiquitous, but considerably less capable in preventing the malware threat.

For the mob hitman, serial killer, or other kind of mass murderer, the threat of sparsely deployed metal detectors is an easily avoided problem. Subversion or avoidance of the detection systems is pretty easy and, more importantly, an appropriate choice of location negates the problem entirely. Even then, such a detection strategy, operated in isolation, isn’t a serious inhibitor for new murders. If such a technology exists to only detect the guns and bullets, but is not capable of providing attribution (e.g. this gun was used in 16 different murders over the last 2 weeks and the owner of this gun is the murderer), then the criminal only ever loses a tool each time they get caught – since the prevention technology is divorced from association with the victims (past or prospective).

But there’s an entirely different side to dealing with that kind of threat – and that’s the forensics element. While our not-so-friendly murderer can avoid the detection technologies, they’re much less capable of avoiding the evidence of violent past actions. Starting from the first murder, it is possible to build a case that points to a specific criminal by analyzing the components of the crime scene(s).
I know, the argument is that everything should be done to prevent the crime in the first place. That’s clearly very difficult in the physical world, but you’re basically living in a fantasy land of Goblins and Unicorns if you’re expecting it to work better in the cyber-world.

Which basically brings me to the discussion (and subsequent detection) of the latest generation of sophisticated malware – malware that uses domain generation algorithms (DGA) to locate their command and control infrastructure and upload their stolen data. Malware with this capability are designed to evade all those conventional prevention technologies and, once successfully deployed within their victim populace, evade all other methods of traffic filtering and takedown processes. Even if a malware sample is accidentally captured, its DGA capabilities will go undetected.
Detecting DGA-based malware is, as I implied earlier, both “magic” and a reversal of conventional prevention approaches. In order to detect DGA-based threats early on, you start with the victims first…

DGA-based malware use an algorithm to pick out candidate domain names in order to hunt for their prospective C&C servers. The vast majority of the domain names they’re looking for simply don’t exist. In the world of DNS, attempting to resolve a domain name that doesn’t exist will result in a “no such domain” (i.e. an NX) response from an authoritative DNS server somewhere down the line. So, in essence, DGA’s are noisy if you’re watching DNS activity – and lots of NX responses are a key feature of an infected host. Unfortunately, the average Internet-connected device typically tries to look up lots of things that don’t exist, and there’s often a lot of legitimate NX traffic which can disguise the flapping of the malware.

Assuming some kind of algorithmic basis to the domain candidates being created by the malware, you could suppose that it would be possible to develop a unique signature for them. If only it was that easy – the criminals are smarter than that. And you’re also assuming that you’ve already encountered a copy of the malware before in order to create a signature for that particular DGA-based malware.

Instead there’s a much better way – you monitor all your DNS traffic and NX responses, and you identify clusters of devices (or IP addresses) that are generating roughly similar domain name requests. This first pass will provide a hint that those devices share some kind of common ailment (without ever needing to know what the malware is or have ever encountered a sample before). In a second pass you focus upon identifying just the domain names that are structurally very similar across all the afflicted assets and classify that DGA based upon a number of features. Classifying the DGA makes it easier for attribution and uniquely tracking a new threat.

At this point you’re pretty much sure that you have a new malware outbreak and you know who all the victims are, and you can easily track the growth (or demise) of the botnet within your network. Unfortunately you’re also dealing with a brand new threat and you probably don’t have a malware sample… and that’s where an additional layer of analytics comes into play and more “magic” happens as you automatically begin to identify the domain names that are tried by the DGA-based malware that actually succeed and engage with the criminals server.

Let’s work through a simple example uncovered last week by Damballa Labs. Over the weekend one of our NX monitoring tools identified a few thousand IP addresses within our customer base that were generating clusters of NX DNS traffic very similar to the following:

9ftor9srimbna-q.com, 0sso151a47nztrxld6.com, 0ubpccgkvzrnng.com, dq2yl1zxcmvko2.com, wfj-5i5p4uhq8tylhiz.com, 1sj4rh1i5l8-4tca.com, zv7dfcgtusnttpl.com, om2air5ah5lisj7.com, qmyuvaftpgve2tzwhcjr.com, fndskzqmsob5r2bzby.com, f5p2vxn7dcdkujhbguqb.com, 6q5sigkfu3fl3q.com, zpbm3emcuosopagdttxi.com , trsmw3wceik79krk5.com, qtxfmfyulnnjpxwqo.com, ikh9w-3vdmlndafja.com, udf-szhubujmuhp1jj.com, 4v8hohrphizcas.com, nhb72anz8ifdyzgckqf.com, rcu4n0lzoghuj2.com, dklfebjexiabttkwvgos.com, fjg56xwoupqpdxr.com, drxyuezdllerpd.com, t407bqgh56jbkv4ua.com, 49ubufqnjzvhct.com, n7c3qfpslcjosy-5.com, nvnzihl6krkyfo8zp.com 
While you’d be hard pressed to write a “signature” for the domain names that wouldn’t cause false positives out the wazoo, the features of the combined victim’s traffic (frequency, distribution, commonality, etc.) work fine as a way of associating them to a shared new threat.

Armed with this knowledge, it is then possible to identify similarly structured domain names that were successfully resolved by the victims that also shared timing elements or appeared to alter the flow of NX domain responses. For example, if the DGA-based malware is designed to locate a “live” C&C server, once it’s found a server it probably doesn’t need to keep on looking for more and will likely stop generating NX domain traffic for a period of time.

Based upon our observations of this particular botnet outbreak, it was possible to identify the following C&C servers being operated by the criminals:
  • ###.133. ###.247 – idpd1y###-elywj.com
  • ###.133. ###.247 – k7-cwubgsrqj###rb.com
  • ###.133. ###.247 – omz###1k1vgrqf.com
  • ###.133. ###.247 – taqwucpzj###an.com
  • ###.133. ###.247 – vhrey###-ooz6ig.com
  • ###.133. ###.75 – o###pp1k1vgrqf.com
  • ###.133. ###.75 – rm6dol7###cxje-ajl.com
  • ###.133. ###.191 – id###yzib-e###j.com
  • ###.133. ###.191 – k7-c###gsrqjebzrb.com
  • ###.133. ###.191 – ###gpp1k1vgrqf.com
  • ###.133. ###.191 – rm6dol###wcxje-ajl.com
  • ###.133. ###.191 – taqwucpzj###an.com
  • ###.133. ###.191 – vhreyveh-ooz###.com
[NOTE: We've temporarily obfuscated some of this data while we continue to investigate and enumerate the global pool of victims. We'll release the technical details of this particular DGA-based botnet soon…]


So, by this point we know who the victims are, how many C&C servers the criminals are operating, what their IP addresses are and, subsequently, which hosting facilities they are operating from:
  • AS13237 LAMBDANET-AS Lambdanet Communications Deutschland GmbH

What about the malware piece? While we know who the victims are, it would be nice to know more about the tool that the criminals behind this botnet prefer and ideally to get to know them more “personally” – if you know what I mean…

As it happens, there are some malware samples that have been discovered in the very recent past that also like to contact C&C’s running upon the same hosts at this facility. For example:
  • d977ebff137fa97424740554595b9###
Fortunately, while the malware sample wasn’t detected by any antivirus products, it had previously been automatically executed within our dynamic analysis environment and we’d already extracted all of its observable network features, including an additional (successful) C&C engagement:
  • m8###ecc9lsnks6kbcrv7.com using ###.133. ###.75, ###.133. ###.191, and ###.23. ###.139
This, in turn, helped identify the following additional hosting facility based in the Netherlands:
  • AS49981 WORLDSTREAM WorldStream
There’s obviously much more to this particular threat, and if you’d like to get involved digging into it and helping with the attribution please let me know…

But, getting back on track, this approach in identifying brand new threats – while sounding like magic to many – works really well! We’ve found it to be immensely scalable and fantastically accurate.

However, there’s one fly in the ointment (as it were)… the approach identifies new threats long before the malware component is typically uncovered by the security community and is independent of the the vector used to infect the victims,the malware tool that was deployed and ultimately the actual endpoint device type.

Think of it this way. Based upon the forensics of the blood splattered bodies and evidence left in the room that the murderer left behind, we know that the victims were bludgeoned to death with an ovoid object approximately 6 inches in diameter, weighing 3 pounds and composed of a calcium shell. We also know that the murderer was 5 foot 11 inches, weighed 240 pounds and wears size 11 Reeboks.

You can keep your metal detectors for all the good that’ll do in this case…

Sunday, July 15, 2012

Electronic Accountability in Syria Civil War

In today's story by the BBC covering Syria they note the the conflict has now officially evolved in to a civil war - http://www.bbc.co.uk/news/world-middle-east-18849362

By being legally categorized as a civil war all participants are now subject to the articles of war - such as the Geneva convention. It also means that the persons behind any crimes and atrocities committed during this war can be prosecuted as international war criminals even after the conflict ends.

With the trials currently underway in the Hague against the leaders of the Bosnia war, I was thinking how different prosecutions of war crime in Syria will likely be different - given a considerably more networked world and advances in electronic monitoring.

When I read about the most recent murders of 100 souls, it is inevitable that there will be a kind of electronic trail that did not exist for wars of even a decade ago.

The instructions and target coordinates of the artillery will have been communicated and authorized electronically - not just as written communications, but also as digital voice and CB radio. The point though is that there will be a recoverable record somewhere. Given the high level of electronic eavesdropping by the combatants and other observers (e.g. NATO forces and local non-combatants), even those localized communications between regional commands and tank drivers can be intercepted, stored, and shuttled to appropriate authorities relatively easily.

Those issuing criminal commands can expect to not only be held accountable, but can expect those crimes and attribution to be documented to an excruciating level of detail - leaving little ambiguity to future courts.

Some may argue that encryption will be their savior. I doubt it. The tools they're using to generate and decipher those communications will become available to investigators post-conflict. And, regardless of access, as we're observing with the prosecutions relating to a conflict that occurred practically two decades ago, technology advances. How sure would you be that  even your 128bit encrypted digital radio messages will hold up to decryption techniques and capabilities in 20 years?

No, leaders and those issuing commands will be held accountable with evidence that has never been so rich and attributable.

Sunday, July 1, 2012

One Billion Creditcards Stolen

"The details of one billion stolen credit cards were posted yesterday upon hundreds of Web sites around the world." What would we we if that actually happened? (and how do you know it hasn't happen today?)

Practically every day there's some kind of public disclosure about some company-or-other having been infiltrated and the credit card details of a bunch of their customers were stolen. Despite several years of increased disclosures and ever-higher volumes of cards being stolen, I'm not actually sure what the impact is. Granted, every so often you'll see some followup story about how XYZ Corp is being sued due to third-party losses due to the data breach; but really, what would happen if there were more data losses... much more...

I don't know how many credit and debit cards there are in circulation around the world, but I'm pretty sure it's going to be measured in the multiple billions. So what could happen to the world if one billion (i.e. 1,000,000,000) credit cards and all the appropriate card owners details were intercepted and dumped on the Internet for all to see (and use?) at midnight tonight?

You might question the logistics of such an interception and accumulation of that many cards. Here are (just some) some ways in which it could happen:
  • A number of popular underground carder forums (used to match buyers with sellers of stolen credit cards) get hacked, and all the accounts of the carders that sell their stolen wares through the forum in turn have their accounts hacked in to. A few domino's fall and, before you know it, the hacker has breached the credit card repositories of a few dozen prolific sellers and steals their stolen data. To undermine those hacker carders and their illegal businesses, the hacker dumps copies of all the data on a few hundred pastebin and anonymous file-hosting sites (making it impractical for law enforcement to take down the data after the fact).
  • A small number of disgruntled IT employees at one of the major payments processing companies backdoor a number of critical servers and data repositories - continually running batch jobs that store the relevant metadata in an encrypted archive, that is updated with any new card details. 24 hours after they resign (or are laid off due to restructuring) they extract the data dump they had been preparing for months and dump it on the Internet because they hated the company and what it did to them.
  • A foreign power has spent 2 years infiltrating Visa International and a few dozen of the largest merchant banks using digital and human intrusion techniques, and has managed to accumulate the details of all their customers. The attackers filter the stolen credit card data for only US and EU and anonymously release the data in order to undermine those economies.
I don't know how far-fetched the last couple of scenarios are (and I know that plenty of safe-guards have been installed to counter various scenarios) but, at the end of the day, it doesn't really matter. The data exists somewhere in digital form and, given the right skills, circumstances, and motivations, it would be possible to accumulate and dump the details of one billion stolen credit cards.

So, the stolen data is stolen, made publicly available for all and sundry to access and potentially use, what happens now? Does our financial system collapse? Do organizations begin to sue one-another over overestimated (potential) losses they've incurred? Do the owners of those stolen credit cards loose everything? Does anyone who has their own credit card stop using it - loosing faith in that aspect of the banking system?

I think this is a discussion that we really need to have. To be frank, getting hold of the data related to a (few) billion credit cards is getting easier every day. I believe it is inevitable that truly colossal dumps of stolen data will occur sometime soon.

 The impact will be huge.

Lets ignore all of the behind-the-scenes shenanigans the lawyers and bankers will perform and, for once, focus on just one person... and maybe that happens to be you. What happens if you wake up tomorrow morning, head on in to work, stop by the Starbucks on the corner to grab your morning coffee and your card is denied. So you try another card, and it too is denied. You get on the phone to your bank to try to find out what happening and you're greeted with a robo-message that hundreds of millions of the bank-issued credit cards have been stolen and that they've taken action to ensure that no fraudulent charges will be made to your cards. The downside? None of your cards work in the meantime and it'll be at least a couple of weeks before the bank can issue and post out the replacements (and that's being damned optimistic - given the scale of the problem). I hope you have enough cash for gas to get home that evening.