Technicalinfo.net Blog: metrics

Showing posts with label metrics. Show all posts

Monday, June 18, 2012

Botnet Metrics: Learning from Meteorology

As ISP’s continue to spin up their anti-botnet defenses and begin taking a more active role in dealing with the botnet menace, more and more interested parties are looking for statistics that help define both the scale of the threat and the success of the various tactics being deployed. But, as I discussed earlier in the year (see “Household Botnet Infections“), it’s not quite so easy to come up with accurate infection (and subsequent remediation) rates across multiple ISP’s.

To overcome this problem there are several initiatives trying to grapple with this problem at the moment – and a number of perspectives, observations and opinions have been offered. Obviously, if every ISP was using the same detection technology, in the same way, at the same time, it wouldn’t be such a difficult task. Unfortunately, that’s not the case.

One of the methods I’m particularly keen on is leveraging DNS observations to enumerate the start-up of conversations between the victim’s infected device and the bad guys command and control (C&C) servers. There are of course a number of pros & cons to the method – such as:

DNS monitoring is passive, scalable, and doesn’t require deep-packet inspection (DPI) to work [Positive],
Can differentiate between and monitor multiple botnets simultaneously from one location without alerting the bad guys [Positive],
Is limited to botnets that employ malware that make use of domain names for locating and communicating with the bad guys C&C [Negative],
Not all DNS lookups for a C&C domain are for C&C purposes [Negative].

On the top of all this lies the added complexity that such observations are conducted at the IP address level (and things like DHCP churn can be troublesome). This isn’t really a problem for the ISP of course – since they can uniquely tie the IP address to a particular subscriber’s network at any time.
One problem that persists though is that a “subscriber’s network” is increasingly different from “a subscriber’s infected device”. For example, a subscriber may have a dozen IP-enabled devices operating behind their cable modem – and it’s practically impossible for an external observer to separate one infected device from another operating within the same small network without analyzing traffic with intrusive DPI-based systems.

Does that effectively mean that remote monitoring and enumeration of bot-infected devices isn’t going to yield the accurate statistics everyone wants? Without being omnipresent, then the answer will have to be yes – but that shouldn’t stop us. What it means is that we need to use a combination of observation techniques to arrive at a “best estimate” of what’s going on.

In reality we have a similarly complex monitoring (and prediction) system that everyone is happy with – one that parallels the measurement problems faced with botnets – even if they don’t understand it. When it comes to monitoring the botnet threat the security industry could learn a great deal from the atmospheric physicists and professional meteorologists. Let me explain…

When you lookup the weather for yesterday, last week or last year for the 4th July, you’ll be presented with numerous statistics – hours of sunshine, inches of rainfall, wind velocities, pollen counts, etc. – for a particular geographic region of interest. The numbers being presented to you are composite values of sparse measurements.

To arrive at the conclusion that 0.55 inches of rainfall fell in Atlanta yesterday and 0.38 inches fell in Washington DC over the same period, it’s important to note that there wasn’t any measurement device sitting between the sky and land that accurately measured that rainfall throughout those areas. Instead, a number of sparsely distributed land-based point observations specific to the liquid volume of the rain were made (e.g. rain gauges), combined with a number of indirect methods (e.g. flow meter gauges within major storm drain systems), and broad “water effect” methods (e.g. radar) were used in concert to determine an average for the rainfall. This process was also conducted throughout the country, using similar (but not necessarily identical) techniques and an “average” was derived for the event.

That all sounds interesting, but what are the lessons we can take away from the last 50 years of modern meteorology? First and foremost, the use of accurate point measurements as a calibration tool for broad, indirect monitoring techniques.

For example, one of the most valuable and accurate tools modern meteorology uses for monitoring rainfall doesn’t even monitor rain or even the liquid component of the droplets – instead it monitors magnetic wave reflectivity. Yes, you guessed it – it’s the radar of course! I could get all technical on the topic, but essentially meteorological radar measure the reflection of energy waves from (partially) reflective objects in the sky. By picking the right wavelength of the magnetic wave from a radar, it gets better at detecting different sized objects in the sky (e.g. planets, aircraft, water droplets, pollutant particulates, etc.). So, when it comes to measuring rain (well, lots of individual raindrops simultaneously to be more precise), the radar system measures how much energy of a radar pulse was returned and at what time (the time component helps to determine distance).

Now radar is a fantastic tool – but by itself it doesn’t measure rainfall. Without getting all mathematical on you, the larger an individual raindrop the substantially bigger the energy reflection – which means that a few slightly larger raindrops in the sky will completely skew the energy measurements of the radar – meanwhile, the physical state of the “raindrop” also affects reflectivity. For example, a wet hailstone reflects much more energy than an all-liquid drop. There are a whole bunch of non-trivial artifacts of rainfall (you should checkout things like “hail spikes” for example) that have to be accounted for if the radar observations can be used to derive the rainfall at ground level.

In order to overcome much of this, point measurements at ground level are required to calibrate the overall radar observations. In the meteorological world there are two key technologies – rain gauges and disdrometers. Rain gauges measure the volume of water observed at a single point, while disdrometers measure the size and shape of the raindrops (or hail, or snow) that are falling. Disdrometers are pretty cool inventions really – and the last 15 years have seen some amazing advancements, but I digress…

How does this apply to Internet security and botnet metrics? From my perspective DNS observations are very similar to radar systems – they cover a lot of ground, to a high resolution, but they measure artifacts of the threat. However those artifacts can be measured to a high precision and, when calibrated with sparse ground truths, become a highly economical and accurate system.

In order to “calibrate” the system we need to use a number of point observations. By analogy, C&C sinkholes could be considered rain gauges. Sinkholes provide accurate measurements of victims of a specific botnet (albeit, only a botnet C&C that has already been “defeated” or replaced) – and can be used to calibrate the DNS observations across multiple ISP’s. A botnet that has victims within multiple ISP’s that each observe DNS slightly differently (e.g. using different static reputation systems, outdated blacklists, or advanced dynamic reputation systems), could use third-party sinkhole data for a specific botnet that they’re already capable of detecting via DNS, as a calibration point (i.e. scaling and deriving victim populations for all the other botnets within their networks).

Within their own networks ISP’s could also employ limited scale and highly targeted DPI systems to gauge a specific threat within a specific set of circumstances. This is a little analogous to the disdrometer within meteorology – determining the average size and shape of events at a specific point, but not measuring the liquid content of the rainfall directly either. Limited DPI techniques could target a specific botnet’s traffic – concluding that the bot agent installs 5 additional malware packages upon installation that each in turn attempt to resolve 25 different domain names, and yet are all part of the same botnet infection.

Going forward, as ISP’s face increased pressure not only to alert but to protect their subscribers from botnets, there will be increased pressure to disclose metrics relating to their infection and remediation rates. Given the consumer choice of three local ISP’s offering the same bandwidth for the same price per month, the tendency is to go for providers that offer the most online protection. In the past that may have been how many dollars of free security software they bundled in. Already people are looking for proof that one ISP is better than another in securing them – and this is where botnet metrics will become not only important, but also public.

Unfortunately it’s still early days for accurately measuring the botnet threat across multiple ISP’s – but that will change. Meteorology is a considerably more complex problem, but meteorologists and atmospheric physicists have developed a number of systems and methods to derive the numbers that the majority of us are more than happy with. There is a lot to be learned from the calibration techniques used and perfected in the meteorological field for deriving accurate and useful botnet metrics.

Tuesday, March 27, 2012

Musings over Sizing of the Botnet Menace

Pinning down the number of infected computers is really, really hard. I’d go as far as saying it’s practically impossible to calculate, let alone observe. Still, that’s not going to stop people from attempting to guess or extrapolate from their own observations. Over the years I’ve heard “reliable” numbers ranging from 10% through to 60% – and I don’t trust any of them.

There’s a whole gaggle of reasons why the numbers being thrown out to the public are inaccurate and should ideally be interpreted with a lot of skepticism by any right-minded folks. If I had to boil it down to only a couple of categories of reasons they’d be semantics and observational bias. Semantic, because terms such as “infected computers” and “compromised devices” are different from “compromised users” and “victims”, and observational bias because no vendor is omnipresent and their perspective of the threat is represented only by their category of customer and the tools they employ.

These problems represent hurdles for a number of collaborative projects seeking to measure and track the botnet menace. There are several initiatives (e.g. the Online Trust Alliance) and working groups (e.g. the Messaging, Mobile, and Malware Anti-abuse Working Group) striving to collate disparate datasets and views of botnet infections with the hope that the industry can baseline the problem in order to track and measure the success of other initiatives designed to reduce the threat. The premise being if you can’t measure it, how do you know you’ve been successful in fixing the problem?

Given Damballa’s unique perspective of the botnet threat and participation in various working groups on the topic, I thought I’d share a little of what we’re observing – and the bounds of what that means.
First of all, it’s important to note that Damballa has two major product lines – one catering for large enterprise networks (Damballa Failsafe), and the other focused on ISP’s and Telco’s (Damballa CSP , for communications service providers). Given the nature of these products and the types of customers that purchase them, there are effectively two major “infection” statistics of note for this first part of 2012:

When we deploy Damballa Failsafe we find that, on average, between 3-7% of assets within enterprise networks are identified as being infected and are actively searching for, or successfully connecting to, a cybercriminals C&C server.
Within the ISP/Telco world that have chosen to deploy the Damballa CSP product, between 18-22% of unique subscriber IP addresses are actively seeking to connect to known C&C servers.

These infection statistics are not directly comparable. In the case of Damballa Failsafe deployments, we’re able to track and identify the unique device that has been infected by any number of crimeware instances, and separate out all of the C&C and data leakage communications, and differentiate between infections. In the case of the Damballa CSP product, because of ISP-level restrictions on deep packet inspection (DPI) and the fact that a subscriber IP address encompasses any and all devices within that subscribers personal network, we’re only able to deduce that at least one device within that subscriber’s network is part of a particular botnet (but we can enumerate each of the multiple botnets that may be operating within that subscribers network).

For the sake of this being a blog, let’s focus on the topic of “household botnet infections”. For all intents and purposes in the residential ISP world, a subscriber’s IP address is pretty close to being analogous to a “household”. Out of the aggregated 125 million subscriber IP addresses that Damballa CSP product monitors from within our ISP customer-base from around the world, the vast majority of those subscriber IP’s would be classed as “residential” – so it would be reasonable to say that roughly 1-in-5 households contain botnet infected devices.

From previous observations we also know that approximately 40% of infected devices have two or more botnet infections within them (see the H1 2011 Damballa Threat Report). Now if only we knew what the average number of devices within residential home networks is. Alas, I can’t find out that information (send me the info if you happen to know!). When I last looked at my poor wireless router’s admin panel at home, it would appear that I have something like 40 IP enabled devices chatting away and connecting to the Internet. Who knows, but I suspect that my household probably isn’t typical – and shouldn’t be used for any kind of extrapolation.

Anyhow, with all those numbers in mind, where in this “10% through to 60%” scale of global infected computers do I think the true numbers lie? Well there’s one more caveat to all this – it’s the semantic piece – infected computers is a superset of botnet infected devices. What Damballa product deployments are capable of enumerating (since they sit at the network level, and not at the host) are infected devices that are actively trying or successfully engaging with a criminals C&C infrastructure – and not all malware does this, and not all devices are “computers”. So malware that cannot be controlled or tasked remotely by a criminal, and malware that doesn’t upload stolen data somewhere over the network, aren’t going to appear in my observation statistics.

Given that the average number of devices within a residential subscriber network is going to be greater than one (let’s say “two” for now – until someone has a more accurate number), I believe that it’s reasonable to suggest that around 10% of home computers are infected with botnet crimeware.

With regards to “infected” computers (i.e. all types of malware – not just botnet malware), I don’t know what the ratio of botnet malware is to the overall malware installation problem. Of all the malware caught and shared globally amongst commercial antivirus vendors, the majority of malware samples would certainly seem to be “droppers” and “downloaders” (choose your terminology) – mostly because of serial variant production systems. Perhaps the desktop antivirus statistics are right with the 60%+ of computers being infected – but I doubt it (since the desktop antivirus products are only going to report the stuff they’re capable of detecting and stopping – not the slippery stuff).