Technicalinfo.net Blog: AI

Showing posts with label AI. Show all posts

Tuesday, June 20, 2023

Why Cybersecurity is Critical in MLOps

Larger and more sophisticated businesses will lean into building out their in-house data science teams and capabilities

If your business relies on machine learning (ML) to drive strategic decision-making, you’re in good company. A recent report by ClearML shows the technology clearly entering the mainstream, with 60% of organizations’ ML leaders planning to increase ML investments by more than a quarter in 2023. The same study revealed that 99% of respondents either already have dedicated budgets for ML operations (MLOps) or plan to implement them this year.

But, as MLOps mature, they also carry more risk. According to a recent study by NCC Group, organizations are deploying ML models in more applications without considering security requirements. In a separate survey by Deloitte, nearly two-thirds of AI and ML users describe cybersecurity risks as a significant or extreme threat, but only 39% feel prepared to combat those risks.

MLOps model creation pipelines are vulnerable and easily attacked in three separate ways: by malicious insiders, through software supply chain manipulation, and via compromised systems. If the SolarWinds supply chain attack taught the industry anything, it’s that continuous build processes are both a target for sophisticated adversaries and a blind spot for in-house security operations teams.

In 2023, continuous build processes will continue to be a target for threat actors. As these attacks start to impact enterprises’ bottom lines, they will have to start paying more attention to the cybersecurity side of MLOps.

Here are some ways to make MLOps projects safer and more secure.

Secure the Whole Pipeline

Part of the challenge of securing MLOps is the sheer length and depth of typical machine learning pipelines. They include a half dozen or more phases – data collection and preparation, along with the creation, evaluation, optimization, deployment and usage of an ML model. Vulnerabilities can crop up at any point in the process.

Early on, in data collection, threat actors can taint the data, manipulate the annotation or conduct adversarial attacks on the metadata stores. In later phases, open-source models and frameworks can include hidden vulnerabilities. Potential bias and system performance need to be addressed. And as models are deployed and used, new data is often introduced, expanding the attack surface and opening an organization up to all kinds of threats – including evasion attacks, model theft, code injections and privacy attacks.

At the tail end of the process, there’s a lot of intellectual property inside an ML model. Decades of transactional data and learnings from financial models that are built and trained into models may only be 10s of kilobytes in size. It’s easier to steal that model than to steal the actual source data.

These models tend to be exposed. Attackers have become skillful at querying models and reproducing them somewhere else. This requires a new way of thinking about the value of the model. Tooling and alerting not only around the theft of data but around the manipulation of the models is important to an overall MLOps security strategy.

Invest in Tooling to Scale Across SOCs

It’s no secret that security is no longer siloed in a single department. It cuts across all functions, and organizations are creating Security Operations Centers (SOCs) to improve the visibility, manageability and auditing of their overall security posture. To extend the SOC’s capability to MLOps, organizations need to incorporate tooling that scales to much larger uses than ever before.

Meeting MLOps’ data needs forces SOCs to adapt in two ways. Existing SOC operations teams are now accountable, forcing them to build the additional tooling and reporting to support MLOps teams from a security perspective. Plus, MLOps teams that are specialized in data science curation are able to leverage larger toolsets – including logging analytics platforms that provide higher levels of threat detection.

Double Down on Security Best Practices

Some of the best defense tactics for MLOps are practices organizations deploy regularly across the rest of their operations. A zero trust security policy requires the authentication and authorization of anybody trying to access applications or data used in the development of ML models. It also tracks their activity. Applying the principle of least privilege (PLoP) limits users’ access to the exact data sets and models they are authorized to touch. This reduces the attack surface by prohibiting hackers who have gained access to one data trove from moving freely throughout the system.

Use Analytics to Observe and Log ML Tasks

An important step in protecting an ML system is to understand the system’s behavior in healthy and unhealthy states. To do this, organizations need to set up alerts that trigger action before an incident occurs. This is called “observability.” A vulnerability introduced early in the training data will affect the model’s performance down the line. Tracking performance data and logging metrics of ML tasks gives organizations insights into any and all security issues that could affect the ML model.

Future Monitoring of Model Development Lifecycle

The continuous lifecycle of MLOps necessitates the continuous monitoring of a deployed model’s response to adversarial manipulation and corruption. In the future, expect to see larger and more sophisticated businesses lean into building out their in-house data science teams and capabilities, detecting threats with security analytics, and pruning and filtering data from unknowns to improve the advancements in the next generation of ML and AI.

-- Gunter Ollmann

First Published: Security InfoWatch - June 20, 2023

Thursday, September 8, 2022

It’s Time for Security Analytics to Embrace the Age of Science Over Art

Security analytics has traditionally been approached with a “hunt and peck” mentality, which has made the process of uncovering and responding to cyberthreats more art than science. A human analyst has an idea of what they are looking for when they begin to hunt across the available data, performing that task based on their own experience. They’ve been taught to celebrate when they find something, and that the trickier and more obscure the discovery, the greater the celebration of their skills.

This situation is, I believe, an “art” because the results will always differ between analysts — the day of the week, what they had for breakfast, or how their weekend went — and there are too many outside factors that can affect the individual doing the hunting. The situation has only been perpetuated by an industry that has for too long touted the value of this “art.”

We’re no longer working with a simple canvas

We’ve all heard it before and will continue to hear it — data volumes and the enterprise landscape have been growing exponentially and that’s not going to stop. This was put into hyperdrive with the rapid adoption of cloud computing, which challenges organizations to collect and analyze complete data from multiple sources, including new cloud data as well as from legacy, on-premises infrastructures. This has resulted in limited visibility that ultimately compromises overall security.

What we’re not hearing enough is that applying to this challenge the long-held belief in the “art” of hunt-and-peck doesn’t scale and isn’t a reliable or repeatable process that can come close to meeting the needs of modern enterprise environments.

Managing haystacks of needles

We all know the saying “finding a needle in a haystack.” But in today’s threat landscape, given the data volumes with which analysts are burdened, it’s more like finding the sharpest needle in a haystack of needles. Following the decades-old mantra of “assume breach,” we need to turn our focus to the threats that matter most — the sharpest needles. This requires operationalizing the hunt, triage, investigation and response by removing humans from being “artistic” speed bumps and instead empowering them with the science of protection embedded in security analytics.

Adopting the science of security analytics that leverages automation built on machine learning and AI enables repeatable, reliable streaming investigations of threats across all the data, at all times. Applying this method will reveal orders of magnitude more threats and incidents — across a broad spectrum of risk — occurring continuously within the enterprise. We’ve reached the tipping point where threat volumes have far exceeded what any number of human analysts could reasonably hunt/triage, let alone respond to. This means enterprise security teams must increasingly apply AI and ML to the management of the threats they find (i.e., managing those stacks of needles) as well as the mitigations and responses.

Reprieve begins with automation

Building processes that are autonomous is the critical element to embracing a scientific approach to protection. While past security solutions focused on automation, they were largely unsuccessful due to inflexibility and reliance upon humans to choose the right automation steps in advance of applying them for every exception. This is not the role people should be playing when it comes to successfully implementing autonomous solutions, and it doesn’t do anything to lighten their load. Instead, autonomous solutions should deploy system “smartness” to fill in the blanks and know to ask for human guidance when it’s actually needed.

If we continue with the mantra of “assume breach,” and operationalize security as described above, we also must completely rethink the human-focused SOC solution of filtering alerts. With people having been swamped to the point of (and beyond) alert fatigue, the solution has been to drastically manage the funnel of events and alerts, thus reducing the aperture of enterprise threat visibility and response — none of which sounds like a solution to me.

It begs the question: Why bother collecting alerts and events in the first place if you’re only going to do something with 1% of the top 1% most critical alerts? My response: Filtering is the worst way to manage security.

Instead, let’s do this:

With modern AI and autonomous hunting and triaging solutions, the system can look at every event and alert as it streams by and correlate, question and enrich them all in real time — all the time. The more data collected the more accurate and useful the autonomous system becomes, improving its ability to identify the collective stories and present them to the business and the analysts. To take it a step further, the autonomous system can then, in most cases, perform autonomous responses to the threats being found.

Human and machine harmony

Anytime automation in security is discussed it brings up the fear of automating away the analyst. But with a science-first approach, they aren’t going anywhere. The human analyst role is transforming, which will be a huge benefit to the people who work in SOCs. By adopting a scientific method for security analytics, the analyst will influence and guide the autonomous system to ensure it delivers business impact and value:

For exceptions when the AI doesn’t have enough information or confidence to provide an autonomous response, it watches and learns how the human analyst does or did it, thus building and establishing a scientific methodology.
At the cloud-SaaS level, those learnings may come from hundreds of enterprise SOC teams and thousands of expert security analysts, from which the AI systems can take collective intelligence and apply those learnings and methodology refinements back into the hands of the individual analyst.

The final result? The loop gets closed. The analyst is augmented.

The autonomous system deals with the daily grind, identifies the gaps that require human expertise, learns by watching how humans fill in the methodology gaps, and reapplies those learnings collectively. For instance, assume that a security team is capable of performing 100 manual investigations per day. An autonomous system could ask millions of forensic questions in a day. Time to resolution is shortened by augmenting the work the analyst does. The autonomous system performs repetitive, data-intensive work, it can quickly go back in time and ask an infinite number of questions, and the efficiency benefits just go on and on.

Leading with science will equip security analysts with actionable data across use cases ranging from threat detection, threat investigation, and threat hunting to ransomware investigation and incident response. It helps security teams work smarter and respond faster while boosting productivity and strengthening security.

-- Gunter Ollmann

First Published: Medium - September 8, 2022

Tuesday, March 23, 2021

The Cusp of a Virtual Analyst Revolution

Security Analytics and Threat Investigation Are in the Midst of a Sea Change

Once live stomping around vendor-packed expo halls at security conferences returns, it is highly probable that “Virtual Analyst” will play a starring role in buzzword bingo. Today, the loosely defined term represents an aspiration for security vendors and managed service providers but may be perceived as a threat by internal day-to-day security operations and threat hunting teams.

For context, security analytics and threat investigation are in the midst of a sea change. Cloud log analytics platforms now enable efficient and timely analysis of ever-increasing swathes of enterprise logs, events, and alerts dating back years. Threat Intelligence platforms are deeply integrated into cloud SIEM solutions—enabling both reactive and proactive threat hunting and automated incident investigation—and are entwined with a growing stack of sophisticated AI and ML capabilities. However, smart event correlation and alert fusion engines automatically triage the daily deluge of suspiciousness down to a manageable stack of high-priority incidents—replete with kill-chain reassembly and data enrichment.

In many environments the traditional tier-one security analyst responsibilities for triaging events (removing false positives and “don’t care” noise) and maintaining operational health of scale-limiting SOC systems (e.g., device connectors, log retention and storage parameters, ticket response management) have already been subsumed by modern SIEM solutions. Meanwhile, platform-native no-code/low-code-powered orchestration and automation capabilities, along with growing libraries of community-sourced investigation and response playbooks, have greatly accelerated incident response and efficacy for tier-two analysts—alleviating time-consuming repetitive tasks and increasing focus on new and novel incidents.

Arguably, the Virtual Analyst is already here—captured within the intelligent automation and efficiencies of modern cloud SIEM— and I believe the journey has just begun.

The near future evolution of the Virtual Analyst is being driven by two competing and intwined motions —the growing need for real-time threat response, and the inaccessibility of deep security knowledge and expertise.

Real-time threat response has long been thought an achievable target for in-house security operations teams and has underpinned many historic CISO security purchasing decisions. As the enterprise attack surface has grown, adversaries (external and internal) have increased the breadth and pace of attack, and in response businesses continue to invest heavily in instrumenting their environments with an “assume breach” mindset—widening the visibility aperture and exponentially increasing the volume and timeliness of threat-relatable data. Advanced log analytics capabilities and AI-powered event fusion processes are identifying more incidents earlier along the kill-chain and consequently providing more opportunities to conditionally mitigate a budding threat or disrupt a sequence of suspicious events.

To successfully capitalize on that shrinking window of opportunity, responses need to occur at super-human speeds. The speed bump introduced by requiring a human in that response loop will increasingly materialize as the difference between having been attacked versus being breached. In this context, the Virtual Analyst represents the super-human capabilities AND responsibilities for real-time threat identification AND trusted automated mitigation of a live incident.

Although that Virtual Analyst capability will be tightly bound to a product (e.g., Cloud SIEM, SOC-as-a-Service), the second Virtual Analyst motion centers around access to deep security expertise.

If a product-bound Virtual Analyst can be considered a quick-learning high-speed generalist, the second motion can be thought of as a flexible “on-call” specialist—augmenting the security operations team’s investigative and response capabilities as needed—and may be conceptually akin to the on-demand specialist services provided by traditional managed security service and incident response providers.

The differentiated value of cloud-based Virtual Analyst solutions will lie in leveraging broader internet-spanning datasets for threat detection and attribution, and powerful, rapid, ad hoc forensic-level investigation of incidents and response. For example, the in-house SOC team may engage the Virtual Analyst to augment an ongoing investigation by temporarily connecting it to their on-premises SIEM, and receive targeted direction for capturing and collecting incident-relevant non-SIEM data (e.g., PCAPs, VM images, storage snapshots, configuration files) that are uploaded and automatically investigated by the virtual analyst as well as incorporated for real-time instruction on system recovery and attack mitigation.

It’s tempting to think that on-premises security analysts’ days are numbered. Virtual analyst advancements will indeed increase the speed, fidelity, and efficacy of threat detection and incident response within the enterprise—replacing almost all repeated and repeatable analyst tasks. But AI-powered virtual analyst solutions will do so with little knowledge or context about the business and its priorities.

With the day-to-day noise and incident investigation drudgery removed, security operations teams may evolve into specialist business advisors—partnering with business teams, articulating technology risks, and providing contextual security guidance.

-- Gunter Ollmann

First Published: SecurityWeek - March 23, 2021

Tuesday, May 5, 2020

Tackling the SDLC With Machine Learning

Businesses’ digital transformations continue to show that being relative and competitive are directly tied to the ability to develop and harness software. As the CEO of Microsoft, Satya Nadella, oft says—“every company is now a software company.”

Software flaws that lead to unintentional data leakage, cause breaches, or jeopardize public health or the environment are not only costly but may be terminal to a company’s future. Integrity and security of the software and the development processes behind them have therefore become a critical component of every organization’s success. It is a core reason CISOs are increasingly partnering with DevOps leaders and vigilantly modernizing secure development lifecycle (SDLC) processes to embrace new machine learning (ML) approaches.

Automated application security testing is a key component of modern SDLC practices and can economically uncover many bugs and potential security flaws with relative ease. Application security testing embraces a broad range of complementary techniques and tooling—such as static application security testing (SAST), dynamic application security testing (DAST), interactive application security testing (IAST), and runtime application self-protection (RASP). Current best practice security advice recommends a mix of tools from this alphabet soup to mechanically flag bugs and vulnerabilities to mitigate the consequences of unresolved bugs that make it to production systems.

A troublesome consequence of this approach lies with the volume of identified software flaws and the development team’s ability to corroborate the flaw’s risk (and subsequent prioritization). It’s also a problem manifest in organizations that operate bug bounty programs and need to triage bug researchers’ voluminous submissions. Even mature, well-oiled SDLC businesses battle automated triage and prioritization of bugs that flow from application security testing workflows—for example, Microsoft’s 47,000 developers generate nearly 30,000 bugs a month.

To better label and prioritize bugs at scale, new ML approaches are being applied and the results have been very promising. In Microsoft’s case, data scientists developed a process and ML model that correctly distinguishes between security and non security bugs 99 percent of the time and accurately identifies critical, high-priority security bugs 97 percent of the time.

For bugs and vulnerabilities outside automated application security testing apparatus and SDLC processes—such as customer- or researcher-reported bugs—additional difficulties in using content-rich submissions for training ML classifier systems can include reports with passwords, personally identifiable information (PII), or other types of sensitive data. A recent publication “Identifying Security Bug Reports Based Solely on Report Titles and Noisy Data” highlights that appropriately trained ML classifiers can be highly accurate even when preserving confidential information and restricted to using only the title of the bug report.

CISOs should stay informed of innovations in this area. According to Coralogix, an average developer creates 70 bugs per 1,000 lines of code and fixing a bug takes 30 times longer than writing a line of code.

By correctly identifying security bugs from what is increasingly an overwhelming pile of bugs generated by automated application testing tools and customer-reported flaws, businesses can properly prioritize their development teams’ fix workflow and further reduce application risks to their organization, customers, and partners.

Although much research and innovation are underway in training ML classifier systems to triage security bugs and improve processes encapsulated in modern SDLC, it will be a while before organizations can purchase off-the-shelf, integrated solutions.

CISOs and DevOps security leaders should be alert to new research publications and what “state of the art” is, and press their automated application software testing tool suppliers to advance their solutions to intelligently and correctly label security bugs apart from the daily chaff.

-- Gunter Ollmann

First Published: SecurityWeek - May 5, 2020

Tuesday, March 3, 2020

Advancing DevSecOps Into the Future

If DevOps represents the union of people, process, and technology to continually provide value to customers, then DevSecOps represents the fusion of value and security provided to those same customers. The philosophy of integrating security practices within DevOps is obviously sensible (and necessary), but by attaching a different label perhaps we are likely admitting that, despite best efforts, this “fusion” is more of an emulsification.

DevSecOps incorporates discrete security elements and capabilities throughout the development process; “security as code” is the hymn recited by development and security operations teams alike. But when you look closer, the security elements of DevSecOps are discrete, like the tiny immiscible spheres of oil suspended within a tasty vinaigrette — incorporated rather than invisibly entwined within the fabric of DevOps.

Today’s DevSecOps can largely be divided into two core functions: the automated checking and gated prevention of known and potential security flaws throughout the continual integration and continual deployment (CI/CD) workflow, and the operational monitoring and response to security-imbued telemetry generated by the deployment and surrounding protection technologies.

Rightly, we cocoon the applications that flow from our CI/CD workflows with further layers of discrete security tooling to monitor, alert, and ideally protect against broad categories of threats — threats that may be more economically and reliably prevented from outside than within the workflows. Those layers of security almost always operate independently from the application they are defending. This needs to change if we’re to “level up” security and roll DevSecOps back into DevOps.

Although security operations (SecOps) teams are becoming vastly more efficient at managing and responding to the alerts generated by their perimeter, server, and behavioral defense systems, there is a need to incorporate this same telemetry, response workflows, and decision-making into both the CI/CD workflow and the application itself if businesses are to successfully battle advancing threats such as Adversarial AI, data lake tainting, and behavioral poisoning.

Too many DevSecOps workflows depend upon humans being in them. They’re the “bump in the wire,” and when adversaries switch to newer automated or AI-enabled attack and exploitation modes, system compromise and data breaches will (repeatedly) occur before fixes can be created, defenses tweaked, and patches applied.

The future lies in moving beyond the independent operations of “secure the code” and “protect the app,” and into the realm of self-defending applications.

It sounds grandiose, but there are some core elements and opportunities to progress toward applications that can defend themselves.

Telemetry from the security technologies that cocoon the application need to be available and consumable to the application and the CI/CD workflow.
Applications must know when external security tools and monitors suspect or alert when attacked and be capable of responding if advantageous to do so. For example, an application may be capable of natively securely parsing a fund transfer request, but by knowing that a WAF had identified and blocked the previous 12 HTTP POST submissions due to malicious SQL injection payloads for the same session in the past 500 milliseconds, it could leverage the information in handling this 13th transfer and user session — perhaps by deceiving the attacker with a fake and evidentiary traceable response.
Security technologies need to standardize on nomenclatures, severity, and impact for both threats and behaviors. The new generation of cloud-based SIEM, through normalization of data connectors and telemetry, is capable of providing a degree of (vendor-specific) standardization and is primed for being the source of real-time security telemetry for CI/CD and application consumption. Application development frameworks need to understand this nomenclature and, ideally, come pre-armed with libraries and functions to respond with best practices.
Increased AI adoption and fusion within the CI/CD workflow can accelerate the pace at which workflows can respond to security telemetry. For example, a server-based security agent identifies a memory overflow and subsequent unwanted process startup, while the SIEM is able to reconstruct the session sequence to highlight the transaction string (0-day exploit). An intelligent and automated CI/CD process should be able to use that information to identify the vulnerable code and correct the logic flaw or bug, and proceed with an update to the live application with a fix — without developer involvement.

Security responsibility must, and will continue to, “shift left.” To enable that, security telemetry needs to be both accessible and incorporated into the application and the DevOps workflow, and the developers themselves must be comfortable and knowledgeable in integrating the information. Better developer tooling — such as secure coding languages and frameworks, accessible best-practice libraries and functions, and smart in-line developer guidance and correctors — will help close the gap.

Rapid advancement of AI and ML technologies and incorporation into the CI/CD workstream will be able to increase the pace of security integration and secure deployment. There is still much work to be done, and subsequently there are great opportunities for innovative companies to add significant value to the process.

In the meantime, CISOs and DevOps leaders should press hard on technologies and processes that remove the human speed bumps from the CI/CD workflow. Adversaries are advancing at a fast pace in their development of fully automated and autonomous attack engines. Soon, defense and response will be measured in milliseconds, not in days and weeks as it is now.

-- Gunter Ollmann

First Published: SecurityWeek - March 3, 2020

Tuesday, April 9, 2019

Get Ready for the First Wave of AI Malware

While viruses and malware have stubbornly stayed as a top-10 “things I lose sleep over as a CISO,” the overall threat has been steadily declining for a decade. Unfortunately, WannaCry, NotPetya, and an entourage of related self-propagating ransomware abruptly propelled malware back up the list and highlighted the risks brought by modern inter-networked business systems and the explosive growth of unmanaged devices.

The damage wrought by these autonomous (not yet AI-powered) threats should compel CISOs to contemplate the defenses to counter such a sophisticated adversary.

The threat of a HAL-9000 intelligence directing malware from afar is still the realm of fiction, so too is the prospect of an uber elite hacker collective that has been digitized and shrunken down to an email-sized AI package filled with evil and rage. However, over the next two to three years, I see six economically viable and “low hanging fruit” uses for AI infused malware – all focused on optimizing efficiency in harvesting valuable data, targeting specific users, and bypassing detection technologies.

Removing the reliance upon frequent C&C communications – Smart automation and basic logic processing could be employed to automatically navigate a compromised network, undertake non-repetitive and selective exploitation of desired target types and, upon identification and collection of desired data types, perform a one-off data push to a remote service controlled by the malware owner. While not terribly magical, such AI-powered capabilities would not only undermine all perimeter blacklist and enforcement technologies, but also sandboxing and behavioral analysis detection.
Use of data labeling and classification capabilities to dynamically identify and capture the most interesting or valuable data – Organizations use these types of data classifiers and machine learning (ML) to label and protect valuable data assets. But attackers can exploit the same search efficiencies to find the most valuable business data being touched by real users and systems and to reduce the size of data files for stealthy exfiltration. This enables attackers to sidestep traffic anomaly detection technologies as well as common deception and honeypot solutions.
Use of cognitive and conversational AI to monitor local host email and chat traffic and to dynamically impersonate the user – The malware’s AI could insert new conversational content into email threads and ongoing chats with the objective of socially engineering other employees into disclosing secrets or prompting them to access malicious content. Since most email and chat security solutions focus on in-bound and egress content, internal communication inspection is rare. Additionally, conversational AI is advancing quickly enough to make socially engineering IT helpdesk and support staff into disclosing secrets or making temporary configuration a high probability.
Use of speech to text translation AI to capture user and work environment secrets –Through a physical microphone, the AI component could convert all discussions within range of the compromised device to text. In addition, some environments may enable the AI to successfully capture the keystrokes of nearby systems and deduce what keys are being pressed. Such an approach also allows hackers to be more selective of what secrets to capture, further minimizing the volume of data that must be egressed from the business, which then reduces the odds of triggering network-based detection technologies.
Use embedded cognitive AI in applications to selectively trigger malicious payloads – Since it is possible for cognitive AI systems to not only recognize a specific face or voice, but also determine their race, sex, and age, it is therefore possible for a malware author to be very specific in who they choose to target. Such malware may only be malicious for the CFO of the company or may only manifest itself if the interactive user is a pre-teen female. Because the trigger mechanism is embedded within complex AI, it becomes almost impossible for automated or manual investigation processes to determine the criteria for initiating the malicious behaviors.
Capture the behavioral characteristics and traits of system users – AI learning systems could observe the unique cadence, timbre, and characteristics of the users typing, mouse movements, vocabulary, misspellings, etc. and create a portable “bio-profile” of the user. Such “bio-profiles” could then be reused by attackers to bypass the current generation of advanced behavioral monitoring systems that are increasingly deployed in high security zones.

These AI capabilities are commercially available today. Collectively or singularly, each AI capability can be embedded as code within malicious payloads.

Because deep neural networks, cognitive AI, and trained machine language classifiers are incredibly complex to decipher, the trigger mechanism for malicious behaviors may be deeply buried and impossible to uncover through reverse engineering practices.

The baseline for defending against these attacks will lie in ensuring all parts of the organization are visible and continually monitored. In addition, CISOs need to invest in tooling that brings speed and automation to threat discovery through AI-powered detection and response.

As malware writers harness AI for cybercrime, the security industry must push forward with a new generation of dissection and detonation technologies to prepare for this coming wave. A couple promising areas for implementing defensive AI include threat intelligence mining and autonomous response (more on this later).

-- Gunter Ollmann

First published: SecurityWeek - April 9, 2019

Friday, September 21, 2018

The Security Talent Gap is Misunderstood and AI Changes it All

Despite headlines now at least a couple years old, the InfoSec world is still (largely) playing lip-service to the lack of security talent and the growing skills gap.

The community is apt to quote and brandish the dire figures, but unless you're actually a hiring manager striving to fill low to mid-level security positions, you're not feeling the pain - in fact there's a high probability many see problem as a net positive in terms of their own employment potential and compensation.

I see today's Artificial Intelligence (AI) and the AI-based technologies that'll be commercialized over the next 2-3 years as exacerbating the problem - but also offering up a silver-lining.

I've been vocal for decades that much of the professional security industry is and should be methodology based. And, by being methodology based, be reliably repeatable; whether that be bug hunting, vulnerability assessment, threat hunting, or even incident response. If a reliable methodology exists, and the results can be consistently verified correct, then the process can be reliably automated. Nowadays, that automation lies firmly in the realm of AI - and the capabilities of these newly emerged AI security platforms are already reliably out-performing tier-one (e.g. 0-2 years experience) security professionals.

In some security professions (such as auditing & compliance, penetration testing, and threat hunting) AI-based systems are already capable of performing at tier-two (i.e. 2-8 years experience) levels for 80%+ of the daily tasks.

On one hand, these AI systems alleviate much of the problem related to shortage and global availability of security skills at the lower end of the security professional ladder. So perhaps the much touted and repeated shortage numbers don't matter - and extrapolation of current shortages in future open positions is overestimated.

However, if AI solutions consume the security roles and daily tasks equivalency of 8-year industry veterans, have we also created an insurmountable chasm for resent graduates and those who wish to transition and join the InfoSec professional ladder?

While AI is advancing the boundaries of defense and, frankly, an organizations ability to detect and mitigate threats has never been better (and will be even better tomorrow), there are still large swathes of the security landscape that AI has yet to solve. In fact many of these new swathes have only opened up to security professionals because AI has made them available.

What I see in our AI Security future is more of a symbiotic relationship.

AI's will continue to speed up the discovery and mitigation of threats, and get better and more accurate along the way. It is inevitable that tier-two security roles will succumb and eventually be replaced by AI. What will also happen is that security professional roles will change from the application of tools and techniques into business risk advisers and supervisors. Understanding the business, communicating with colleagues in other operational facets, and prioritizing risk response, are the intangibles that AI systems will struggle with.

In a symbiotic relationship, security professionals will guide and communicate these operations in terms of business needs and risk. Just as Internet search engines have replaced the voluminous Encyclopedia Britannica and Encarta, and the Dewey Decimal system, Security AI is evolving to answer any question a business may raise about defending their organization - assuming you ask the right question, and know how to interpret the answer.

With regards to the skills shortage of today - I truly believe that AI will be the vehicle to close that gap. But I also think we're in for a paradigm change in who we'll be welcoming in to our organizations and employing in the future because of it.

I think that the primary beneficiaries of these next generation AI-powered security professional roles will not be recent graduates. With a newly level playing field, I anticipate that more weathered and "life experienced" people will assume more of these roles.

For example, given the choice between a 19 year-old freshly minted graduate in computer science, versus a 47 year-old woman with 25 years of applied mechanical engineering experience in the "rust belt" of the US,... those life skills will inevitably be more applicable to making risk calls and communicating them to the business.

In some ways the silver-lining may be the middle-America that has suffered and languished as technology has moved on from coal mining and phone-book printing. It's quite probable that it will become the hot-spot for newly minted security professionals - leveraging their past (non security) professional experiences, along with decades of people or business management and communication skills - and closing the missing security skills gap using AI.

-- Gunter

Thursday, March 8, 2018

NextGen SIEM Isn’t SIEM

Security Information and Event Management (SIEM) is feeling its age. Harkening back to a time in which businesses were prepping for the dreaded Y2K and where the cutting edge of security technology was bound to DMZ’s, Bastion Hosts, and network vulnerability scanning – SIEM has been along for the ride as both defenses and attacker have advanced over the intervening years. Nowadays though it feels less of a ride with SIEM, and more like towing an anchor.

Despite the deepening trench gauged by the SIEM anchor slowing down threat response, most organizations persist in throwing more money and resources at it. I’m not sure whether it’s because of a sunk cost fallacy or the lack of a viable technological alternative, but they continue to diligently trudge on with their SIEM – complaining with every step. I’ve yet to encounter an organization that feels like their SIEM is anywhere close to scratching their security itch.

The SIEM of Today

The SIEM of today hasn’t changed much over the last couple of decades with its foundation being the real-time collection and normalization of events from a broad scope of security event log sources and threat alerting tools. The primary objective of which was to manage and overcome the cacophony of alerts generated by the hundreds, thousands, or millions of sensors and logging devices scattered throughout an enterprise network – automatically generating higher fidelity alerts using a variety of analytical approaches – and displaying a more manageable volume of information via dashboards and reports.

As the variety and scope of devices providing alerts and logs continues to increase (often exponentially) consolidated SIEM reporting has had to focus upon statistical analytics and trend displays to keep pace with the streaming data – increasingly focused on the overall health of the enterprise, rather than threat detection and event risk classification.

Whilst the collection of alerts and logs are conducted in real-time, the ability to aggregate disparate intelligence and alerts to identify attacks and breaches has fallen to offline historical analysis via searches and queries – giving birth to the Threat Hunter occupation in recent years.

Along the way, SIEM has become the beating heart of Security Operations Centers (SOC) – particularly over the last decade – and it is often difficult for organizations to disambiguate SIEM from SOC. Not unlike Frankenstein’s monster, additional capabilities have been grafted to today’s operationalized SIEM’s; advanced forensics and threat hunting capabilities now dovetail in to SIEM’s event archive databases, a new generation of automation and orchestration tools have instantiated playbooks that process aggregated logs, and ticketing systems track responder’s efforts to resolve and mitigate threats.

SIEM Weakness

There is however a fundamental weakness in SIEM and it has become increasingly apparent over the last half-decade as more advanced threat detection tools and methodologies have evolved; facilitated by the widespread adoption of machine learning (ML) technologies and machine intelligence (MI).

Legacy threat detection systems such as firewalls, intrusion detection systems (IDS), network anomaly detection systems, anti-virus agents, network vulnerability scanners, etc. have traditionally had a high propensity towards false positive and false negative detections. Compounding this, for many decades (and still a large cause for concern today) these technologies have been sold and marketed on their ability to alert in volume – i.e. an IDS that can identify and alert upon 10,000 malicious activities is too often positioned as “better” than one that only alerts upon 8,000 (regardless of alert fidelity). Alert aggregation and normalization is of course the bread and butter of SIEM.

In response, a newer generation of vendors have brought forth new detection products that improve and replace most legacy alerting technologies – focused upon not only finally resolving the false positive and false negative alert problem, but to move beyond alerting and into mitigation – using ML and MI to facilitate behavioral analytics, big data analytics, deep learning, expert system recognition, and automated response orchestration.

The growing problem is that these new threat detection and mitigation products don’t output alerts compatible with traditional SIEM processing architectures. Instead, they provide output such as evidence packages, logs of what was done to automatically mitigate or remediate a detected threat, and talk in terms of statistical risk probabilities and confidence values – having resolved a threat to a much higher fidelity than a SIEM could. In turn, “integration” with SIEM is difficult and all too often meaningless for these more advanced technologies.

A compounding failure with the new ML/MI powered threat detection and mitigation technologies lies with the fact that they are optimized for solving a particular class of threats – for example, insider threats, host-based malicious software, web application attacks, etc. – and have optimized their management and reporting facilities for that category. Without a strong SIEM integration hook there is no single pane of glass for SOC management; rather a half-dozen panes of glass, each with their own unique scoring equations and operational nuances.

Next Generation SIEM

If traditional SIEM has failed and is becoming more of a bugbear than ever, and the latest generation of ML and MI-based threat detection and mitigation systems aren’t on a trajectory to coalesce by themselves into a manageable enterprise suite (let alone a single pane of glass), what does the next generation (i.e. NextGen) SIEM look like?

Looking forward, next generation SIEM isn’t SIEM, it’s an evolution of SOC – or, to license a more proscriptive turn of phrase, “SOC-in-a-box” (and inevitably “Cloud SOC”).

The NextGen SIEM lies in the natural evolution of today’s best hybrid-SOC solutions. The Frankenstein add-ins and bolt-ons that have extended the life of SIEM for a decade are the very fabric of what must ascend and replace it.

For the NextGen SIEM, SOC-in-a-box, Cloud SOC, or whatever buzzword the professional marketers eventually pronounce – to be successful, the core tenets of operation will necessarily include:

Real-time threat detection, classification, escalation, and response. Alerts, log entries, threat intelligence, device telemetry, and indicators of compromise (IOC), will be treated as evidence for ML-based classification engines that automatically categorize and label their discoveries, and optimize responses to both threats and system misconfigurations in real-time.
Automation is the beating heart of SOC-in-a-box. With no signs of data volumes falling, networks becoming less congested, or attackers slackening off, automation is the key to scaling to the businesses needs. Every aspect of SOC must be designed to be fully autonomous, self-learning, and elastic.
The vocabulary of security will move from “alerted” to “responded”. Alerts are merely one form of telemetry that, when combined with overlapping sources of evidence, lay the foundation for action. Businesses need to know which threats have been automatically responded to, and which are awaiting a remedy or response.
The tier-one human analyst role ceases to exist, and playbooks will be self-generated. The process of removing false positives and gathering cohobating evidence for true positive alerts can be done much more efficiently and reliably using MI. In turn, threat responses by tier-two or tier-three analysts will be learned by the system – automatically constructing and improving playbooks with each repeated response.
Threats will be represented and managed in terms of business risk. As alerts become events, “criticality” will be influenced by age, duration, and threat level, and will sit adjacent to “confidence” scores that take in to account the reliability of sources. Device auto-classification and responder monitoring will provide the framework for determining the relative value of business assets, and consequently the foundation for risk-based prioritization and management.
Threat hunting will transition to evidence review and preservation. Threat hunting grew from the failures of SIEM to correctly and automatically identify threats in real-time. The methodologies and analysis playbooks used by threat hunters will simply be part of what the MI-based system incorporates in real-time. Threat hunting experts will in-turn focus on preservation of evidence in cases where attribution and prosecution become probable or desirable.
Hybrid networks become native. The business network – whether it exists in the cloud, on premise, at the edge, or in the hands of employees and customers – must be monitored, managed, and have threats responded to as a single entity. Hybrid networks are the norm and attackers will continue to test and evolve hybrid attacks to leverage any mitigation omission.

Luckily, the NextGen SIEM is closer than we think. As SOC operations have increasingly adopted the cloud to leverage elastic compute and storage capabilities, hard-learned lessons in automation and system reliability from the growing DevOps movement have further defined the blueprint for SOC-in-a-box. Meanwhile, the current generation of ML-based and MI-defined threat detection products, combined with rapid evolution of intelligence graphing platforms, have helped prove most of the remaining building blocks.

These are not wholly additions to SIEM, and SIEM isn’t the skeleton of what will replace it.

The NextGen SIEM starts with the encapsulation of the best and most advanced SOC capabilities of today, incorporates its own behavioral and threat detection capabilities, and dynamically learns to defend the organization – finally reporting on what it has successfully resolved or mitigated.

-- Gunter Ollmann

Sunday, January 15, 2017

Allowing Vendors VPN access during Product Evaluation

For many prospective buyers of the latest generation of network threat detection technologies it may appear ironic that these AI-driven learning systems require so much manual tuning and external monitoring by vendors during a technical “proof of concept” (PoC) evaluation.

Practically all vendors of the latest breed of network-based threat detection technology require varying levels of network accessibility to the appliances or virtual installations of their product within a prospect’s (and future customers) network. Typical types of remote access include:

Core software updates (typically a pushed out-to-in update)
Detection model and signature updates (typically a scheduled in-to-out download process)
Threat intelligence and labeled data extraction (typically an ad hoc per-detection in-to-out connection)
Cloud contribution of abstracted detection details or meta-data (often a high frequency in-to-out push of collected data)
Customer support interface (ad hoc out-to-in human-initiated supervisory control)
Command-line technical support and maintenance (ad hoc out-to-in human-initiated supervisory control)

Depending upon the product, the vendor, and the network environment, some or all of these types of remote access will be required for the solution to function correctly. But which are truly necessary and which could be used to unfairly manually manipulate the product during this important evaluation phase?

To be flexible, most vendors provide configuration options that control the type, direction, frequency, and initialization processes for remote access.

When evaluating network detection products of this ilk, the prospective buyer needs to very carefully review each remote access option and fully understand the products reliance and efficacy associated with each one. Every remote access option eventually allowed is (unfortunately) an additional hole being introduced to the buyers’ defenses. Knowing this, it is unfortunate that some vendors will seek to downplay their reliance upon certain remote access requirements – especially during a PoC.

Prior to conducting a technical evaluation of the network detection system, buyers should ask the following types of questions to their prospective vendor(s):

What is the maximum period needed for the product to have learned the network and host behaviors of the environment it will be tested within?
During this learning period and throughout the PoC evaluation, how frequently will the product’s core software, detection models, typically be updated?
If no remote access is allowed to the product, how long can the product operate before losing detection capabilities and which detection types will degrade to what extent over the PoC period?
If remote interactive (e.g. VPN) control of the product is required, precisely what activities does the vendor anticipate to conduct during the PoC, and will all these manipulations be comprehensively logged and available for post-PoC review?
What controls and data segregation are in place to secure any meta-data or performance analytics sent by the product to the vendor’s cloud or remote processing location? At the end of the PoC, how does the vendor propose to irrevocably delete all meta-data from their systems associated with the deployed product?
If testing is conducted during a vital learning period, what attack behaviors are likely to be missed and may negatively influence other detection types or alerting thresholds for the network and devices hosted within it?
Assuming VPN access during the PoC, what manual tuning, triage, or data clean-up processes are envisaged by the vendor – and how representative will it be of the support necessary for a real deployment?

It is important that prospective buyers understand not only the number and types of remote access necessary for the product to correctly function, but also how much “special treatment” the PoC deployment will receive during the evaluation period – and whether this will carry-over to a production deployment.

As vendors strive to battle their way through security buzzword bingo, in this early age of AI-powered detection technology, remote control and manual intervention in to the detection process (especially during the PoC period) may be akin to temporarily subscribing to a Mechanical Turk solution; something to be very careful of indeed.

-- Gunter Ollmann, Founder/Principal @ Ablative Security

Friday, January 13, 2017

Machine Learning Approaches to Anomaly and Behavioral Threat Detection

Anomaly detection approaches to threat detection have traditionally struggled to make good on the efficacy claims of vendors once deployed in real environments. Rarely have the vendors lied about their products capability – rather, the examples and stats they provide are typically for contrived and isolated attack instances; not representative of a deployment in a noisy and unsanitary environment.

Where anomaly detection approaches have fallen flat and cast them in a negative value context is primarily due to alert overload and “false positives”. False Positive deserves to be in quotations because (in almost every real-network deployment) the anomaly detection capability is working and alerting correctly – however the anomalies that are being reported often have no security context and are unactionable.

Tuning is a critical component to extracting value from anomaly detection systems. While “base-lining” sounds rather dated, it is a rather important operational component to success. Most false positives and nuisance alerts are directly attributable to missing or poor base-lining procedures that would have tuned the system to the environment it had been tasked to spot anomalies in.

Assuming an anomaly detection system has been successfully tuned to an environment, there is still a gap on actionability that needs to be closed. An anomaly is just an anomaly after all.
Closure of that gap is typically achieved by grouping, clustering, or associating multiple anomalies together in to a labeled behavior. These behaviors in turn can then be classified in terms of risk.

While anomaly detection systems dissect network traffic or application hooks and memory calls using statistical feature identification methods, the advance to behavioral anomaly detection systems requires the use of a broader mix of statistical features, meta-data extraction, event correlation, and even more base-line tuning.

Because behavioral threat detection systems require training and labeled detection categories (i.e. threat alert types), they too suffer many of the same operational ill effects of anomaly detection systems. Tuned too tightly, they are less capable of detecting threats than an off-the-shelf intrusion detection system (network NIDS or host HIDS). Tuned to loosely, then they generate unactionable alerts more consistent with a classic anomaly detection system.

The middle ground has historically been difficult to achieve. Which anomalies are the meaningful ones from a threat detection perspective?

Inclusion of machine learning tooling in to the anomaly and behavioral detection space appears to be highly successful in closing the gap.

What machine learning brings to the table is the ability to observe and collect all anomalies in real-time, make associations to both known (i.e. trained and labeled) and unknown or unclassified behaviors, and to provide “guesses” on actions based upon how an organization’s threat response or helpdesk (or DevOps, or incident response, or network operations) team has responded in the past.

Such systems still require baselining, but are expected to dynamically reconstruct baselines as it learns over time how the human operators respond to the “threats” it detects and alerts upon.
Machine learning approaches to anomaly and behavioral threat detection (ABTD) provide a number of benefits over older statistical-based approaches:

A dynamic baseline ensures that as new systems, applications, or operators are added to the environment they are “learned” without manual intervention or superfluous alerting.
More complex relationships between anomalies and behaviors can be observed and eventually classified; thereby extending the range of labeled threats that can be correctly classified, have risk scores assigned, and prioritized for remediation for the correct human operator.
Observations of human responses to generated alerts can be harnesses to automatically reevaluate risk and prioritization over detection and events. For example, three behavioral alerts are generated associated with different aspects of an observed threat (e.g. external C&C activity, lateral SQL port probing, and high-speed data exfiltration). The human operator associates and remediates them together and uses the label “malware-based database hack”. The system now learns that clusters of similar behaviors and sequencing are likely to classified and remediated the same way – therefore in future alerts the system can assign a risk and probability to the new labeled threat.
Outlier events can be understood in the context of typical network or host operations – even if no “threat” has been detected. Such capabilities prove valuable in monitoring the overall “health” of the environment being monitored. As helpdesk and operational (non-security) staff leverage the ABTD system, it also learns to classify and prioritize more complex sanitation events and issues (which may be impeding the performance of the observed systems or indicate a pending failure).

It is anticipated that use of these newest generation machine learning approaches to anomaly and behavioral threat detection will not only reduce the noise associated with real-time observations of complex enterprise systems and networks, but also cause security to be further embedded and operationalized as part of standard support tasks – down to the helpdesk level.

-- Gunter Ollmann, Founder/Principal @ Ablative Security

(first published January 13th - "From Anomaly, to Behavior, and on to Learning Systems")

Wednesday, May 29, 2013

Security 101: Machine Learning and Big Data

The other week I was invited to keynote at the ISSA CISO Forum on Incident Response in Dallas and in the weeks prior to it I was struggling to decide upon what angle I should take. Should I be funny, irreverent, diplomatic, or analytical? Should I plaster slides with the last quarter’s worth of threat statistics, breach metrics, and headline news? Should I quip some anecdote and hope the attending CISO’s would have an epiphany that’ll fundamentally change the way they secure their organizations?

In the end I did none of that… instead I decided to pull apart the latest batch of security buzzwords – “Big Data” and “Machine Learning”.

If you attended RSA USA (or any major security vendor/booth conference) this year you can’t have missed the fact that everything from Antivirus through to USB memory sticks now come with a dab of big data, a sprinkling of machine learning, and a dollop of cloud for good measure. Thankfully I’m a cynic; or else I’d have been thrashing around on the ground in an epileptic fit from all the flashy marketing claims and trademarked nonsense phrases.

I guess I’m lucky to be in the position of having had several years of exposure to some of the greatest minds at Georgia Tech as they drummed in to me on a daily basis the “what and how” of machine learning in the context of solving many of today’s toughest security problems.

So, it was with that in mind that I thought “If I’m a CISO and everything I know about machine learning and big data came from carefully rehearsed vendor sound bites and glossy pamphlets, would I be able to tell the difference between Chanel #5 and cow manure?” The obvious answer would result in some very happy farmers.

What was the net result of this self-inflection and impending deadline? I crafted a short presentation for CISO’s… a 101 course on machine learning and big data… and it included ducks.

If you’re in the upper tiers of your organization and you’ve had sales folks pimping you their latest cloud-infused, big data-munching, machine learning, and world-hunger-solving security solution, please carry on reading as I attempt to explain the basics of the latest and greatest in buzzwords…

First of all – some context! In the world of breach detection and incident response there’s a common idiom: “If it walks like a duck, flies like a duck, and quacks like a duck… it must be a duck.”

Now I could easily spend another 5,000 words explaining why such an idiom doesn’t apply to modern security threats, targeted attacks and advanced persistent threats, but you’ll have to wait for a different blog post. Rather, for this 101 lesson, it’s important to understand the concept of “Feature Selection” – which in the case of this idiom includes: walking, flying and quacking.

If you’ve been tasked with dealing with a duck problem, ideally you’d be an aficionado on the feet, wings and sounds of ducks. You’d be able to apply this knowledge to each bird you have the time to focus your attention on and make a determination: Duck, or Not a Duck. As a security professional, you’d be versed in the various attributes of certain threats – and able to derive a conclusion as to the nature of the security problem.

The problem though is that at scale things break down.

What do you do when there’s too many to analyze, when time is too short, and when you can’t make out all the duck features from afar? This is typical of the big data problem (and your everyday network traffic). Storing the data is the easy part. Retrieving the data is mechanically complicated, but solvable.

Meanwhile, making decisions and taking actions upon the data is typically the most difficult part. With every doubling of data, your problem grows exponentially.

The traditional method of dealing with the situation has been to employ signature matching systems. In essence, we build rules based upon the features we’ve previously identified as significant and capable of bounding the problem (or duck selection). We then compare these rules against the sample animal and receive a binary answer – Duck, or Not a Duck.

Signature systems can be very efficient at classification. Just look at your average Intrusion Prevention System (IPS). A problem though lies in the scope of the features that had been defined.

If those features (or parameters) used for classification are too narrow (or too broad) then evasion is not only probable, but guaranteed. In essence, for a threat (or duck) to be classified, it must have been observed in the past or carefully predicted (although rare).

From an attacker’s perspective, knowledge of those features and triggering parameters makes it a trivial task to evade or to conduct false flag operations. Just think – hunters do this all the time with their floating duck decoys. Even fellow duck hunters have been known to mistakenly take pot-shots at them too.

Switching pace a little, let’s look at the network a little.

The big green blob is all the network traffic for an organization for a week. The red blog right-of-center is traffic associated with an active breach, and the lighter red blob with the dotted lines are just general malicious traffic observed within the network. In this two-dimensional view (if I hadn’t color-coded it previously) you’d have a near impossible task differentiating between them. As it is, the malicious traffic is mixed with both the “safe” and “breach” traffic.

The trick in differentiating between the network traffic types lies in increasing the dimensionality of the problem. What was a two-dimensional blob suddenly becomes much clearer when an appropriate view or perspective to the data is added. In the context of the above diagram, the addition of a z-axis and an extension in to the third-dimension allows the observer (i.e. analyst) to easily differentiate between the traffic types – for example, the axis could represent “country code of destination IP address”. In this context, the appropriate feature selection can greatly simplify the detection problem. Choosing appropriate features is important – nay, it’s critical!

This is where advances in machine learning over the last half-decade have really come to the fore in computer science and more recently in information security.

Without getting in to any of the math behind the various machine learning algorithms or techniques, the key concept you need to understand is “training”. It can mean many things to many a mathematician, but since we’re likely not one of those, what training means in our context is that we already have samples of what we’re going to be looking for, and samples of things we know we’re definitely not interested in. The better we define and populate these training sets, the more precise the machine learning system we’re employing will be in differentiating between them – and potentially classifying other contenders.

So, in this example we’ve taken a bunch of ducks and grouped them together. They become our “+ve class” – which basically means these are the things we’re interested in. But, equally important, is our “-ve class” – our collection of things we know not to be ducks. In many cases our -ve class may be more important than our +ve class because it contains all those false positives and “nearly” ducks – the things that may have caught us out once before.

One function of a good machine learning system is to automatically determine which attributes make the most sense in differentiating between your +ve and -ve classes.

While our poor old hunter (or analyst) was working with three features – walks, flies, and talks – the computer-based system may have reviewed all the attributes that were available and determined which ones are the most useful in differentiating between “ducks” and “not ducks”. In many cases the system will have weighted the various features (or attributes) to indicate which features are more deterministic of the classes.

For example, texture may be a good indicator of “not a duck” – since none of the +ve class were made from plastic or wood. Meanwhile features such as “wing length” may not be such a good criteria and will be weighted in a way to not have an influence on determining whether a duck is a duck or not – or may be dropped by the system entirely.

The number of features reviewed, assessed and weighted by the machine learning system will eventually be determined by the type of data being used and how “rich” it is. For example, in the network security realm we may be feeding the system with collated samples of firewall logs, DNS traffic samples, IP blacklists/whitelists, IPS alerts, etc. It’s important to note though that the “richer” the data set (i.e. the more possible features there could be), the more complex the problem is for the computer to solve and the longer it’ll take to train the system.

Now let’s switch back to that “big data” we originally talked about. In the duck realm we’re talking about all the birds within a national park (say). Meanwhile, in the network security realm, we may be talking about all the traffic observed in real-time across a corporate network and all the alerting instrumentation (e.g. firewalls, IPS, etc.)

I’m going to do some hand-waving here because it can get rather complex and there’s a lot of proprietary tweaks that can be undertaken here… but in one representation we can get our trained system to automatically group and cluster events on our network.

Using our original training data, or some other previously labeled datasets, it’s possible to automatically label the clusters output by a machine learning system.

For example, in the graphic above we see a number of unique clusters (or blobs if you insist). Through automatic labeling we know that the green blobs are types of ducks, the red blobs are various groupings of not ducks, and the gray blobs are clusters of previously unknown or unlabeled clusters – each one mathematically distinct from the other – based upon the features the system chose.

What the system can also do is assign a probability that the unknown clusters are associated with our +ve or -ve training sets. For example, in this particular graphical representation the proximity of the unlabeled clusters to labeled (and classified) clusters allows the system to assign a probability of whether the cluster is a duck or not a duck – even though the system had never seen these things before (i.e. “birds” the system hasn’t encountered before).

The next (and near final) stage is to manually label these new clusters. For example, we ask an ornithologist to look at each cluster of “ducks” and “not ducks” in turn and to label them… “rubber duckies”, “robot duckies”, and “Madagascar mallard ducks”.

Then, to improve our machine learning system further, we add these newly labeled clusters to our +ve and -ve training sets… and the system continues to learn and become more precise over time.

In addition, since we’ve now labeled these clusters, in the future we’re able to automatically flag new additions to these clusters and correctly label the duck (or threat).

And, if we’re a really smart CISO, we can use this clustering system (and labeled clusters) to automatically alert us to new threats or to initiate automatic network security actions – e.g. enable blocking of a new malicious URL, stop blocking a new cloud service delivering official updates to applications, etc.

The application of machine learning techniques to the toughest security problems facing business today has come along in leaps and bounds recently. However as with any buzz word that falls in to the hands of marketers and gets manipulated until it squeaks and glitters, or oozes onto every product in this year’s price list, senior technical staff need to take added care not to be bamboozled by well-practiced but meaningless word salad.

A little understanding of the concepts behind big data and machine learning can not only cut through the latest batch of sales promises, but can also form the basis of constructing a new generation of meaningful breach detection and incident response solutions.

-- Gunter Ollmann

Original Publication: IOActive Blog - May 29, 2013