Saturday, March 31, 2012

Signature-less Detection That Works

There was a period of time not long ago in which signature-based threat detection was cutting-edge. Antivirus, intrusion detection systems (IDS), data leakage prevention (DLP), content filtering and even anomaly detection systems (ADS) all continue to rely heavily upon static signatures. In recent years vendors have shied away from discussing their dependence on such signatures – instead extolling supplemental “non-signature-based” detection technologies.


In most cases these “non-signature-based” detection technologies appear to be largely a marketing ploy rather than some kind of new innovative solution; consider it a redefinition of what the dictionary would ordinarily define as a “signature” to overcome the inhibitions or bias of potential product purchasers. For example, if you define a signature as a single basic regular expression, then a threshold-based alerting policy could be considered a “non-signature” alternative.

I wanted to provide a walkthrough of the logic of a (real) non-signature-based detection technology. Perhaps “technology” isn’t quite the right word; rather let’s discuss it in terms of a confidence-based detection system. For example, for the purpose of edification, let’s consider a device reaching out to remote server – a destination that hasn’t been touched by the device ever before.


In the vast majority of cases a device will need to resolve a domain name in order to determine the remote IP address of the sever it is planning on connecting to. That DNS traffic contains a wealth of information if you can correlate with other external data sources and you know how to act upon it.


Signature-based detection systems could of course look up the IP address or domain name against their blacklist. If either of those data elements appear upon a list, then the traffic could be classified as malicious and blocked – if not, everything is probably “fine”. In the case of detection systems that utilize dynamic reputation, a scalar value representing the “suspiciousness” of the domain/IP and perhaps the class of threat would be returned (even if the domain and/or IP has never been seen before) and a threshold-based action would occur within the protection technology.


Supplemental to those signature-based detection approaches you could instead start dissecting the observation – compiling levels of suspiciousness and circumstantial evidence – and arriving at a conclusion of maliciousness.


Consider the following aspects of determining the nature of the interaction based purely from observing the DNS traffic related to the resolution of a single host lookup:

  1. Armed with a database of passive DNS information observed over a period of years from around the world you can easily answer the questions:
    “Is this the first time anyone has ever looked up that domain name?”
    “Is this the first time that anyone has ever received that IP address for that domain name?”
    Knowing that a particular device (let’s assume that this entire example is being played out within a corporate network in the USA – say a Fortune 1000 company) is the first thing ever to resolve a domain name or be directed to a new IP address should raise a small degree of suspicion. This type of thing should be very infrequent.
  2. Armed with geographic IP distribution information, you’ll know which country the returned IP address belongs to.
    “Is the destination IP address located somewhere unfriendly?”
    “Is this a country I want this kind of device connecting to?

    Knowing that the IP address belongs to a country with which the Fortune 1000 company doesn’t do any/much business with may be a little suspicious and worthy of more thorough study.
  3. Armed with a list of ISP netblock associations, and knowledge of which of these netblocks are “residential subscribers”, can shed light on the nature of hosting environment of the destination server.
    “Is the destination address a residential IP address?”
    “Is the destination address a static or dynamic IP?”

    Knowing that a corporate device it trying to connect to a remote server located within a residential network should be suspicious to most network administrators – hinting at a number of threats or unwanted traffic.
  4. Focusing a little on the domain name itself, characteristics of its registration may be used to determine levels of suspiciousness:
    When was the domain name registered?”
    “Are there any features of the domain registrant details that cause concern?”

    Knowing that the domain name was registered 2 hours ago is significant. So too is knowing that details of the registrant match a large number of previously detected and categorized malicious domains.
  5. Mining an exhaustive passive DNS database and correlating domain and IP information with historical threat information can yield even more information about the nature of the threat.
    How many other domain names have pointed to the same IP address over the last 1/7/30 days?”
    “Are any of the domain names pointing at that IP address known to be related to a particular threat?”

    By associating an unknown or previously unclassified domain with domains with which historical information and threat attribution exists, enables the corporate entity to evaluate a “guilt by association” value.
  6. Armed with information about the DNS servers that provide the authoritative answer for resolving the domain query provides further insight in to the nature of the destination.
    “Is the domain reliant upon free Dynamic DNS provisioning?”
    “What is the reputation of the authoritative DNS server?”
    “Which country is hosting the authoritative DNS server?”

    Dynamic DNS (DDNS) services are heavily abused by cybercriminals today – and are rarely used by large commercial entities. Understanding the location and past history of the authoritative DNS server (e.g. what proportion of domains hosted at the DNS server have previously been identified as malicious?) hints to the legitimacy of the destination IP address.
  7. With visibility over live and historical domain name resolutions made to a particular authoritative DNS server, it becomes possible to glean information about the nature of the lookups and suspiciousness of the domain name:
    “How frequently is this domain name looked up?”
    “Which countries or organizations have also looked up this domain name?”
    “Who was the first to look up this domain name and get that response?”

    Knowing that a particular domain name has only been looked up by three US-based Fortune 500 companies in the last year is suspicious. Knowing that the same domain name points to an IP address in Iran and has never been looked up by anyone else in the world would be highly suspicious and indicate a level of targeted attack.
  8. Then of course there’s the more obvious categorized information relating to the IP address:
    “Is the IP address a known sinkhole?”
    “Is the IP address associated with a commercial content delivery network?”
    “Is the IP address associated to a domain registrar’s holding page?”

    Knowing that the IP address is pointing to a sinkhole is a pretty obvious indicator that someone already thinks this particular domain name is malicious and associated with a threat. Meanwhile, knowing that the domain is pointing to a generic domain registration holding page could indicate that the domain has been taken down for being malicious, or is waiting to be staged by the criminals, etc.
There are more features that can be extracted from the successful resolution of a domain name than those listed above – but I’m sure you appreciate by now that a large amount of information can be obtained and associated with even a previously unknown domain name – and a certain degree of confidence can be obtained as to the suspiciousness (or maliciousness) of it.


For example, consider the following scenario:

  1. A domain was looked up by a device, and that was the first time ever the domain has been looked up in the history of the Internet.
  2. The authoritative DNS response came from a free DDNS provider in China.
  3. The domain name points to a residential, DHCP assigned IP address, in Tehran.
  4. There are 8 other domain names that have pointed to that particular IP address over the last 30 days.
  5. Five of those domain names have been referenced within previously captured malware as C&C.
  6. All 9 domain names (the original plus the uncovered 8 ) point to 3 IP addresses within the same subnet of the residential ISP in Tehran over the last 30 days – typical of a DHCP network assignment.
  7. Only 6 of these domain names have ever been looked up, and all lookups of those domains over the last 90 days and have only been carried out by 11 major US-based pharmaceutical companies.
  8. None of the domain names have ever been looked up from within Iran.
  9. The first time each of the 6 domains were looked up, they were done from an IP address associated with a popular blackhat free-VPN service provider.
Obviously it shouldn’t take a genius to figure out that not much good is going to come from the device connecting to this particular remote host. All the evidence is circumstantial, but pulled together it becomes actionable intelligence. Most importantly though, all of this can be carried out using just a single DNS response (before any malware is downloaded, before any vulnerabilities are exploited, and before any user is socially engineered) – meaning that protection systems that can handle this level of non-signature-based threat determination engine can take preventative actions before the device has even begun to connect to the destination server.


When I think of non-signature-based detection systems, this is one approach that springs to mind. Such deterministic systems exist today – and they’re working very nicely thank you.

No comments:

Post a Comment