Monday, November 9, 2015

The Incredible Value of Passive DNS Data

If a scholar was to look back upon the history of the Internet in 50 years’ time, they’d likely be able to construct an evolutionary timeline based upon threats and countermeasures relatively easily. Having transitioned through the ages of malware, phishing, and APT’s, and the countermeasures of firewalls, anti-spam, and intrusion detection, I’m guessing those future historians would refer to the current evolutionary period as that of “mega breaches” (from a threat perspective) and “data feeds”.
Today, anyone can go out and select from a near infinite number of data feeds that run the gamut from malware hashes and phishing URL’s, through to botnet C&C channels and fast-flux IPs. 

Whether you want live feeds, historical data, bulk data, or just API’s you can hook in and ad hoc query, more than one person or organization appears to be offering it somewhere on the Internet; for free or as a premium service.

In many ways security feeds are like water. They’re available almost everywhere if take the time to look, however their usefulness, cleanliness, volume, and ease of acquiring, may vary considerably. Hence there value is dependent upon the source and the acquirees needs. Even then, pure spring water may be free from the local stream, or come bottled and be more expensive than a coffee at Starbucks.

At this juncture in history the security industry is still trying to figure out how to really take advantage of the growing array of data feeds. Vendors and enterprises like to throw around the term “intelligence feeds” and “threat analytics” as a means of differentiating their data feeds from competitors after they have processed multiple lists and data sources to (essentially) remove stuff – just like filtering water and reducing the mineral count – increasing the price and “value”.
Although we’re likely still a half-decade away from living in a world were “actionable intelligence” is the norm (where data feeds have evolved beyond disparate lists and amalgamations of data points into real-time sentry systems that proactively drive security decision making), there exist some important data feeds that add new and valuable dimensions to other bulk data feeds; providing the stepping stones to lofty actionable security goals.

From my perspective, the most important additive feed in progressing towards actionable intelligence is Passive DNS data (pDNS).

For those readers unfamiliar with pDNS, it is traditionally a database containing data related to successful DNS resolutions – typically harvested from just below the recursive or caching DNS server.

Whenever your laptop or computer wants to find out the IP address of a domain name your local DNS agent will delegate that resolution to a nominated recursive DNS server (listed in your TCP/IP configuration settings) which will either supply an answer it already knows (e.g. a cached answer) or in-turn will attempt to locate a nameserver that does know the domain name and can return an authoritative answer from that source.

By retaining all the domain name resolution data and collecting from a wide variety of sources for a prolonged period of time, you end up with a pDNS database capable of answering questions such as “where did this domain name point to in the past?”, “what domain names point to a given IP address?”, “what domain names are known by a nameserver?”, “what subdomains exist below a given domain name?”, and “what IP addresses will a domain or subdomain resolve to around the world?”.

pDNS, by itself, is very useful, but when used in conjunction with other data feeds its contributions towards actionable intelligence may be akin to turning water in to wine.

For example, a streaming data feed of suspicious or confirmed malicious URL’s (extracted from captured spam and phishing email sources) can provide insight as to whether the customers of a company or its brands have been targeted by attackers. However, because email delivery is asynchronous, a real-time feed does not necessarily translate to current window of visibility on the threat. By including pDNS in to the processing of this class of threat feed it is possible to identify both the current and past states of the malicious URL’s and to cluster together previous campaigns by the attackers – thereby allowing an organization to prioritize efforts on current threats and optimize responses.

While pDNS is an incredibly useful tool and intelligence aid, it is critical that users understand that acquiring and building a useful pDNS DB isn’t easy and, as with all data feeds, results are heavily dependent upon the quality of the sources. In addition, because historical and geographical observations are key, the longer the pDNS data goes back (ideally 3+ years) and the more data the sources cover global ISPs (ideally a few dozen tier-1 operators), the more reliable and useful the data will be. So select your provider carefully – this isn’t something you ordinarily build yourself (although you can contribute to a bigger collector if you wish).

If you’re looking for more ideas on how to use DNS data as a source and aid to intelligence services and even threat attribution, you can find a walk-through of techniques I’ve presented or discussed in the past here and here.

-- Gunter

No comments:

Post a Comment