Thursday, March 10, 2011

Optimal Methods for Spam and DDoS Offender Discovery

As botnet threats go, Spam and DDoS are probably the most widely known and discussed tactics employed by criminal operators. Despite being some of the last things that career botnet operators employ their compromised victims for, and despite offering the lowest monetization rates for the criminals, DDoS and Spam volume have continued to rise annually.

A question was asked to me recently as to which techniques worked best for dealing with DDoS and Spam participation from within large enterprises or residential DSL/Cable networks – Network Anomaly Detection Systems (NADS) or botnet command-and-control (CnC) enumeration techniques (such as those employed by Damballa)?

It’s not the kind of question that can be answered succinctly. Both approaches are designed to scale to very large networks – and as such are components of a robust protection strategy. In fact the technologies are rather complementary – although I do think that the CnC enumeration approach is more elegant and efficient in the grand scheme of things.

The NADS approach to Spam and DDoS participation detection is simple enough – you monitor netflow (a compact summary of network packet flow – usually to/from IP address, port, protocol, date/time and packet size information), determine a baseline for traffic levels, set alert thresholds for potential anomalies, and define responses when a threshold alert is received. In the context of a simple DDoS threat, you set up a threshold for the volume of HTTP traffic directed at a single destination by a single IP host and label that host as initiating a DDoS attack. If multiple hosts within the network being monitored also reach the HTTP threshold(s) against the same target IP address, you label them all as being part of a DDoS botnet. The same basic principles apply to Spam botnet detection.

An alternative and generally complementary approach to the problem is to automatically identify hosts within the monitored network that are already infected with malware and/or engaged in conversations with botnet CnC servers. This can be achieved in a variety of ways, but one of the simplest ways is to merely observe the DNS requests made by the hosts and the responses from the resolving DNS servers. Having identified suspicious DNS request profiles along with DNS responses that have high probabilities of association with criminal hosting infrastructure, it’s possible to quickly match victims with particular botnets – and label the new (or previously known) CnC fully qualified domain name. Any other hosts exhibiting similar DNS resolution characteristics are members of the same botnet. The beauty of this approach is that this method of detection and botnet enumeration (and labeling) can be done before the botnet victims actually participate in any subsequent Spam or DDoS campaigns.

When it comes to mitigating the threat, the historical way is to effectively block the attack traffic by either firewalling off specific ports or destination IP addresses, or walled gardening the malignant hosts. So, while the botnet host is spewing spam or DDoS traffic, it’s not being routed to its final (target) destination.

That approach may have been OK in the past if you were only dealing with IP-based threat responses and could stomach the voluminous traffic internally, but with more advanced CnC and botnet enumeration technologies you’re able to bring to bear some additional (and more versatile) mitigation techniques. Since you’re constantly identifying and tracking botnet membership and you know which CnC’s these victims are being controlled by, you could perform one or more of the following actions:

  1. As botnet members begin to participate in the DDoS attack or Spam campaign, traffic to and from the CnC server could be blocked. By doing so, no new commands are sent to the botnet victims and they typically cease their attacks. In addition, any other botnet members within the network who have not yet been tasked to participate in the attack will similarly not be able to receive instructions.
  2. Walled Gardens can be selectively initiated around the infected botnet population – blocking just the ports and protocols being used (or likely to be used) in the attack against remote targets – without applying the same blocking to all hosts or subscribers within the network. For example, a botnet may be tasked with DDoSing a popular financial services web portal using a HTTP-based payload. It would therefore be important to only block the attack traffic and allow legitimate traffic through. A walled garden approach could be used in this scenario without having to utilize Deep Packet Inspection (DPI) to differentiate between the attack and legitimate traffic.
  3. The ability to differentiate CnC server activity at the domain name level is important for botnets that utilize fast flux infrastructure to distribute command over large numbers of IP addresses. If recursive DNS services are provided by the organization to their enterprise hosts or subscribers, an alternative DNS response could be sent to the botnet victims – e.g. making botnet.badness.com.cc resolve to localhost (127.0.0.1).
  4. If DPI or PCAP capabilities exist within the organization, they could be selectively deployed to catalog the criminal communications between the botnet members and the CnC server. This detailed evidence of the attack (including the commands being sent by the CnC) can be used for takedown or prosecution purposes.
  5. If the botnet malware agent is relatively unsophisticated or if the CnC server itself is vulnerable to third-party takeover (e.g. a hacked server that the legitimate owner regains control and can now issue commands to the botnet, or if the Botnet CnC portal code contains remotely exploitable vulnerabilities), it may be possible to issue commands “on behalf” of the criminal operator instructing all the botnet members to stop their attack and to automatically uninstall the malware agent.

There are of course many other imaginative ways to use the knowledge of the botnet CnC and its members in preemptive protection strategies too.

I think that NADS-based botnet detection (or more precisely botnet attack traffic detection) is useful for identifying triggers for remediation action – but I think that botnet CnC enumeration techniques can provide greater flexibility in long-term threat management approaches.

GeoIP Irrelevance

GeoIP has traditionally served as a first pass filter for prioritizing the analysis of inbound threats. Over the last few years the value of GeoIP for this purpose has noticeably depreciated and it’s only going to get worse. It’s all relative of course; “worse” doesn’t mean useless, just less valuable in a security context.

At its heart, GeoIP is essentially a mapping between an IP address and some location on a map – and that location may be as specific as a street and postcode, or as broad as a country’s name.

It’s important to note that the various Internet authorities don’t actually administer these IP distribution maps. Unfortunately, there isn’t anything prohibiting (or forcing) IP addresses from being linked to a particular geographical location beyond the registration of netblocks (ranges of contiguous IP addresses) to various entities and where they ultimately choose to host their equipment.

The correlation between IP address and geographical location is left to various organizations (mostly commercial) that have invested in systems making use of a mix of data mining, beaconing and solicitation to obtain actual location information – and this information is bundled up and sold in various consumable formats.

The accuracy of the GeoIP information has always been “variable”. For IP’s associated with large residential ISP’s operating in Western countries – the data is pretty accurate since much of that information has actually been supplied by the subscribers themselves (one way or another – whether they meant to disclose it or not). For IP’s associated with large international organizations – the location data is more often than not meaningless – since it often only reflects the address of the organizations global headquarters rather than the IP’s being used in their various offices and data centers. I’ve found that the more obscure an organization is and the larger their netblock of assigned IP addresses, the less likely GeoIP information will be accurate.

Those artifacts of GeoIP have always been present, but why are things getting worse? There are effectively 3 key aspects as I see it:
  1. You’ve probably heard the news (repeatedly over the last 5 years) that IPv4 IP addresses are running out and just last month the last /8’s were allocated. What this means is that there’s growing pressure to optimize, divide and reassign existing netblock allocations. The result of this is that IP addresses are changing hands – between ISP, organizations, hosting facilities and even countries – at a pace faster than traditional GeoIP service providers can track accurately. This obviously has a catastrophic effect on IP reputation systems too – but I’ll address that issue in a later blog.
  2. The growth of cloud computing, on-demand service provisioning and global balancing of content delivery networks has meant that larger swathes of IP addresses are incorporated into umbrella corporate locations – typically their main data center location. Meanwhile, the organizations utilizing these services may be located anywhere around the world. For example, an organized crime syndicate in Thailand could launch a spear-phishing campaign against Cambodian businesses – sending emails from the US-based Amazon EC2 cloud, and hosting the fraud server within the UK-based ElasticHosts cloud.
  3. There are more service providers offering services that can be easily leveraged for criminal purposes and further obfuscate the true source of an attack – often intentionally (e.g. bullet-proof hosting providers and “privacy protection” services). The trend towards a federated development and provisioning of cybercrime attacks means that the GeoIP information resolves poorly to the generic hosting providers – whose services can be acquired from anywhere around the world. Often the GeoIP data is incorrect – as the service providers have altered or tampered key registration and hosting details.

That all said, GeoIP information is still an incredibly useful first-pass filter for dealing with and prioritizing threat responses.

How can organizations use GeoIP information to supplement their security response?

  1. Most businesses aren’t global and even the global ones don’t necessarily have all offices continuously communicating with all regions of the planet. Create a list of countries or regions that are generally deemed “hostile” and automatically escalate actions based upon observed attacks from that list. As unsavory as it sounds, most organizations can easily compile such a list when pressed – and many will find that simply blocking or dropping traffic to/from those countries will be greatly beneficial. For example, a US-based chain of frozen yogurt stores probably doesn’t need to browse web sites hosted in Somalia and is unlikely to want VPN access attempts initiated from Cypress.
  2. While the bad guys can certainly launch their attacks from “friendly” countries (and even locally) via purchased services or compromised hosts, a sizable percentage of threats encountered on a daily-basis for most organizations do little to hide their source. Therefore, distinguishing between portal login attempts (and failures) initiated from IP addresses based in Beijing China and Atlanta USA can be fruitful in optimizing threat responses.

Of course all bets are off for more sophisticated and targeted threats. But some work effort can be shed through using GeoIP relationship data to filter many criminal and persistent threats.

Nuclear Winter PCAP Repositories

Recently I've been thinking about the catchall approach to security - in particular the absolute-last-stop method of just recording everything on your network and mining it for security events - kind of like surviving a nuclear winter. Here are some additional thoughts...

The other week I spoke at the DoD Cyber Crime Conference here in Atlanta and had a number of questions asked of me relating to the growing number of vendors offering “store it all” network monitoring appliances. That whole approach to network monitoring isn’t an area of security I’ve traditionally given much credence to – not because of the practical limitations of implementing it, nor the inefficiencies and latency of the techniques – but because it’s an inelegant approach to what I think amounts to an incorrectly asked question.

Obviously, given the high concentration of defense and law enforcement attendees that such a conference attracts, there’s an increased emphasis on products that aid evidence gathering and data forensics. The “store it all” angle effectively encompasses devices that passively monitor an organizations network traffic and store it all (every bit and PCAP) on a bunch of disks, tapes or network appliances so that, at sometime in the near future, should someone ever feel the need to or were compelled to, it would be conceptually possible to mine all the stored traffic and forensically unravel a particularly compelling event.

Sounds fantastic! The prospect of having this level of detailed forensic information handy – ready to be tapped at a moment’s notice – is likely verging on orgasmic for many of the “lean forward” incident response folks I’ve encountered over the years.

The “store it all” network monitoring approach is a pretty exhaustive answer to the question “How can I see what happened within my network if I missed it the first time?” But shouldn’t the question be more along the lines of “How can I detect the threat and stop it before the damage is done?”

A “store it all” approach to security is like the ultimate safeguard – no matter what happens, even if my 20 levels of defense-in-depth fail, or someone incorrectly configures system and network logging features (causing events to not be recorded), or if multiple layers of internal threat detection and response systems misbehave, I’d still have a colossal data dump that can eventually be mined. Believe me when I say that I can see some level of comfort in adopting that approach. But the inefficiencies of such a strategy make my eye twitch.

Let’s look at some scoping numbers for consideration. Imagine a medium-sized business with a couple-hundred of employees. Assume for the moment that all those folks, along with several dozen servers, are located at the same building. A typical desktop system has a 1Gbps network interface nowadays, and the networking “backbone” for a network of 250 devices is likely to have a low-end operating capacity of 10Gbps – but let’s assume that the network is only 50% utilized throughout the day. After a little number crunching, if you were to be capturing all that network activity and seeking to store it, you’d be amassing 54TB of data every day – so, perhaps you don’t want to capture everything after all?

How about reducing the scale of the problem and focusing upon just the data going to and from the Internet via a single egress point? Let’s assume that the organization only has a 10Mbps link to their ISP that’s averaging 75% utilization throughout the day. After a little number crunching, you’ll arrive at a wholesome 81GB of data per day. That’s much more manageable and, since a $50k “store it all” appliance will typically hold a couple of Terabytes of data without too many problems, you’d be able to retain a little over three weeks of network visibility.

How does this help your security though? Storing the data isn’t helping on a protection front (neither preemptive nor reactive), and it’s not going to help identify any additional threats you may have missed unless you’re also investing in the tools and human resources to sift through all the data.

To use an analogy, you’re a farmer and you’ve just invested in a colossal hay barn, you’ve acquired the equipment to harvest and bundle the hay, and you’re mowing fields that are capable of growing more hay than you could ever seek to perpetually store. Then someone informs you that one of their cows died because it swallowed a nail that probably came from your hay – so you’d better run through all those hay bales stored in your barn and search for any other nails that could kill someone else’s cow. The fact that the cow that died ate from a hay bale that’s no longer stored in your (full) barn is unfortunate I guess. But anyway, you’re in a reactive situation and you’ll remain in a reactive phase no matter how big your barn eventually becomes.

If you’ve got a suspicion that metal objects (nails, needles, coins, etc.) are likely to be bad juju, shouldn’t you be seeking them out before you’ve gone to all the work of filling your barn with hay bales? Wouldn’t it make more sense to perhaps use a magnet and detect those metal objects at the time you’re cutting the hay – before you’re putting it in a bale, and before you put those bales in your barn? Even if you had no forethought that metal objects in your hay could cause eventually a problem, do you persist with a strategy of periodically hunting for the classic “needle in a haystack” in your barn despite now knowing of the threat?

Getting back to the world of IT security and threat detection (and mitigation)… I’ve found that there are greater efficiencies in identifying threats as the network data is streaming by – rather than reactive post-event data-mining approaches.

I guess I’ll hear some folks ask “what about the stuff they might miss?” There are very few organizations that I can think of able to employ the skills and resources needed to analyze the “store it all” network traffic at a level even remotely comparable to what a security product vendor already includes in their commercial detection offerings – and those vendors are typically doing their analysis in a streaming fashion (and usually with something more sophisticated than magnets).

My advice to organizations looking at adopting “store it all” network monitoring appliances is the following:

  1. If you already have all of your protection and detection bases completely covered, maybe deploying these appliances makes sense – provided you employ the dedicated security analysts and incident response folks to make use of the data.
  2. Do you know what you’re trying to protect? “Store it all” approaches are designed to fill in the gaps of your other threat monitoring and detection systems. Is the threat going to be present at the network egress point, or will you need to store traffic from other (higher-volume) network segments? If so, be cognizant of how far back you can roll your eventual analysis.
  3. If you’re in to hording data for the purpose of forensics and incident response, a more efficient and cost effective approach may be to turn on (and optimize) your logging capabilities. Host logging combined with network logging will yield a very rich data set (and will often be richer than simply storing all network traffic) which can be mined much more efficiently.
  4. If host-based logging isn’t possible or is proving to be too unwieldy, and you find yourself having to maintain a high paranoia state throughout the organization, you may want to consider implementing a flow-based security approach and invest in a network anomaly detection system. That way you’ll get near real-time alerting for bespoke threat categories – rather than labor-intensive reactive data-mining.
  5. If you have money to burn, buy the technology and begin storing all the PCAP data you can. Although I’d probably opt for a Ferrari purchase myself…