If a scholar was to look back upon the history of the
Internet in 50 years’ time, they’d likely be able to construct an evolutionary
timeline based upon threats and countermeasures relatively easily. Having
transitioned through the ages of malware, phishing, and APT’s, and the
countermeasures of firewalls, anti-spam, and intrusion detection, I’m guessing
those future historians would refer to the current evolutionary period as that
of “mega breaches” (from a threat perspective) and “data feeds”.
Today, anyone can go out and select from a near infinite
number of data feeds that run the gamut from malware hashes and phishing URL’s,
through to botnet C&C channels and fast-flux IPs.
Whether you want live
feeds, historical data, bulk data, or just API’s you can hook in and ad hoc
query, more than one person or organization appears to be offering it somewhere
on the Internet; for free or as a premium service.
In many ways security feeds are like water. They’re
available almost everywhere if take the time to look, however their usefulness,
cleanliness, volume, and ease of acquiring, may vary considerably. Hence there
value is dependent upon the source and the acquirees needs. Even then, pure
spring water may be free from the local stream, or come bottled and be more
expensive than a coffee at Starbucks.
At this juncture in history the security industry is still
trying to figure out how to really take advantage of the growing array of data
feeds. Vendors and enterprises like to throw around the term “intelligence
feeds” and “threat analytics” as a means of differentiating their data feeds
from competitors after they have processed multiple lists and data sources to
(essentially) remove stuff – just like filtering water and reducing the mineral
count – increasing the price and “value”.
Although we’re likely still a half-decade away from living
in a world were “actionable intelligence” is the norm (where data feeds have
evolved beyond disparate lists and amalgamations of data points into real-time
sentry systems that proactively drive security decision making), there exist
some important data feeds that add new and valuable dimensions to other bulk
data feeds; providing the stepping stones to lofty actionable security goals.
From my perspective, the most important additive feed in
progressing towards actionable intelligence is Passive DNS data (pDNS).
For those readers unfamiliar with pDNS, it is traditionally
a database containing data related to successful DNS resolutions – typically harvested
from just below the recursive or caching DNS server.
Whenever your laptop or computer wants to find out the IP
address of a domain name your local DNS agent will delegate that resolution to
a nominated recursive DNS server (listed in your TCP/IP configuration settings)
which will either supply an answer it already knows (e.g. a cached answer) or
in-turn will attempt to locate a nameserver that does know the domain name and
can return an authoritative answer from that source.
By retaining all the domain name resolution data and
collecting from a wide variety of sources for a prolonged period of time, you
end up with a pDNS database capable of answering questions such as “where did
this domain name point to in the past?”, “what domain names point to a given IP
address?”, “what domain names are known by a nameserver?”, “what subdomains
exist below a given domain name?”, and “what IP addresses will a domain or
subdomain resolve to around the world?”.
pDNS, by itself, is very useful, but when used in
conjunction with other data feeds its contributions towards actionable
intelligence may be akin to turning water in to wine.
For example, a streaming data feed of suspicious or
confirmed malicious URL’s (extracted from captured spam and phishing email
sources) can provide insight as to whether the customers of a company or its
brands have been targeted by attackers. However, because email delivery is asynchronous,
a real-time feed does not necessarily translate to current window of visibility
on the threat. By including pDNS in to the processing of this class of threat
feed it is possible to identify both the current and past states of the
malicious URL’s and to cluster together previous campaigns by the attackers –
thereby allowing an organization to prioritize efforts on current threats and
optimize responses.
While pDNS is an incredibly useful tool and intelligence
aid, it is critical that users understand that acquiring and building a useful
pDNS DB isn’t easy and, as with all data feeds, results are heavily dependent
upon the quality of the sources. In addition, because historical and
geographical observations are key, the longer the pDNS data goes back (ideally
3+ years) and the more data the sources cover global ISPs (ideally a few dozen
tier-1 operators), the more reliable and useful the data will be. So select
your provider carefully – this isn’t something you ordinarily build yourself
(although you can contribute to a bigger collector if you wish).
If you’re looking for more ideas on how to use DNS data as a
source and aid to intelligence services and even threat attribution, you can
find a walk-through of techniques I’ve presented or discussed in the past here
and here.
-- Gunter
No comments:
Post a Comment