Thursday, September 8, 2022

It’s Time for Security Analytics to Embrace the Age of Science Over Art

Security analytics has traditionally been approached with a “hunt and peck” mentality, which has made the process of uncovering and responding to cyberthreats more art than science. A human analyst has an idea of what they are looking for when they begin to hunt across the available data, performing that task based on their own experience. They’ve been taught to celebrate when they find something, and that the trickier and more obscure the discovery, the greater the celebration of their skills.


This situation is, I believe, an “art” because the results will always differ between analysts — the day of the week, what they had for breakfast, or how their weekend went — and there are too many outside factors that can affect the individual doing the hunting. The situation has only been perpetuated by an industry that has for too long touted the value of this “art.”

We’re no longer working with a simple canvas

We’ve all heard it before and will continue to hear it — data volumes and the enterprise landscape have been growing exponentially and that’s not going to stop. This was put into hyperdrive with the rapid adoption of cloud computing, which challenges organizations to collect and analyze complete data from multiple sources, including new cloud data as well as from legacy, on-premises infrastructures. This has resulted in limited visibility that ultimately compromises overall security.

What we’re not hearing enough is that applying to this challenge the long-held belief in the “art” of hunt-and-peck doesn’t scale and isn’t a reliable or repeatable process that can come close to meeting the needs of modern enterprise environments.

Managing haystacks of needles

We all know the saying “finding a needle in a haystack.” But in today’s threat landscape, given the data volumes with which analysts are burdened, it’s more like finding the sharpest needle in a haystack of needles. Following the decades-old mantra of “assume breach,” we need to turn our focus to the threats that matter most — the sharpest needles. This requires operationalizing the hunt, triage, investigation and response by removing humans from being “artistic” speed bumps and instead empowering them with the science of protection embedded in security analytics.

Adopting the science of security analytics that leverages automation built on machine learning and AI enables repeatable, reliable streaming investigations of threats across all the data, at all times. Applying this method will reveal orders of magnitude more threats and incidents — across a broad spectrum of risk — occurring continuously within the enterprise. We’ve reached the tipping point where threat volumes have far exceeded what any number of human analysts could reasonably hunt/triage, let alone respond to. This means enterprise security teams must increasingly apply AI and ML to the management of the threats they find (i.e., managing those stacks of needles) as well as the mitigations and responses.

Reprieve begins with automation

Building processes that are autonomous is the critical element to embracing a scientific approach to protection. While past security solutions focused on automation, they were largely unsuccessful due to inflexibility and reliance upon humans to choose the right automation steps in advance of applying them for every exception. This is not the role people should be playing when it comes to successfully implementing autonomous solutions, and it doesn’t do anything to lighten their load. Instead, autonomous solutions should deploy system “smartness” to fill in the blanks and know to ask for human guidance when it’s actually needed.

If we continue with the mantra of “assume breach,” and operationalize security as described above, we also must completely rethink the human-focused SOC solution of filtering alerts. With people having been swamped to the point of (and beyond) alert fatigue, the solution has been to drastically manage the funnel of events and alerts, thus reducing the aperture of enterprise threat visibility and response — none of which sounds like a solution to me.

It begs the question: Why bother collecting alerts and events in the first place if you’re only going to do something with 1% of the top 1% most critical alerts? My response: Filtering is the worst way to manage security.

Instead, let’s do this:

With modern AI and autonomous hunting and triaging solutions, the system can look at every event and alert as it streams by and correlate, question and enrich them all in real time — all the time. The more data collected the more accurate and useful the autonomous system becomes, improving its ability to identify the collective stories and present them to the business and the analysts. To take it a step further, the autonomous system can then, in most cases, perform autonomous responses to the threats being found.

Human and machine harmony

Anytime automation in security is discussed it brings up the fear of automating away the analyst. But with a science-first approach, they aren’t going anywhere. The human analyst role is transforming, which will be a huge benefit to the people who work in SOCs. By adopting a scientific method for security analytics, the analyst will influence and guide the autonomous system to ensure it delivers business impact and value:

  • For exceptions when the AI doesn’t have enough information or confidence to provide an autonomous response, it watches and learns how the human analyst does or did it, thus building and establishing a scientific methodology.
  • At the cloud-SaaS level, those learnings may come from hundreds of enterprise SOC teams and thousands of expert security analysts, from which the AI systems can take collective intelligence and apply those learnings and methodology refinements back into the hands of the individual analyst.

The final result? The loop gets closed. The analyst is augmented.

The autonomous system deals with the daily grind, identifies the gaps that require human expertise, learns by watching how humans fill in the methodology gaps, and reapplies those learnings collectively. For instance, assume that a security team is capable of performing 100 manual investigations per day. An autonomous system could ask millions of forensic questions in a day. Time to resolution is shortened by augmenting the work the analyst does. The autonomous system performs repetitive, data-intensive work, it can quickly go back in time and ask an infinite number of questions, and the efficiency benefits just go on and on.

Leading with science will equip security analysts with actionable data across use cases ranging from threat detection, threat investigation, and threat hunting to ransomware investigation and incident response. It helps security teams work smarter and respond faster while boosting productivity and strengthening security.

-- Gunter Ollmann

First Published: Medium - September 8, 2022

Tuesday, March 29, 2022

Why the SOC Needs to Learn from the Aviation Industry

The cybersecurity industry has spent a lot of time talking about improving the analyst experience while not making significant improvements, as much of the efforts have been too focused on finding a silver bullet solution. Combine that with a global pandemic and now things are just getting worse. A recent study published by Devo, the 2021 SOC Performance Report, found that on a 10-point scale, where 10 indicates SOC staff have a “very painful” experience performing their jobs, 72% of respondents rated the pain of SOC analysts at a 7 or above.


Instead of thinking about the aforementioned silver bullet to alleviating SOC pain, I wanted to focus on one of the top sources, alert fatigue, and how the cybersecurity industry might be able to take a page out of another field to find a solution.

In the SOC Performance Report, a whopping 61% said a cause of SOC pain was that there are too many alerts to chase. I think it’s safe to draw the connection that “alert fatigue” will expand to “posture fatigue” and “policy fatigue,” as it adversely affects both recruitment and all too critical retention of experienced SOC professionals.

Alert fatigue may exit the aircraft

So, if we can’t figure out within the security industry, let’s learn from others. There are many non-cyber industries and professions that suffer similarly with alert fatigue, and perhaps the cybersecurity industry can reapply some of those learnings. Across these compatriots of alert fatigue, if we ask the question “how do alarms, warnings, and alerts differ?” I think we’ll find much similarity and overlap in answers — in both the theory and practice of how human operators are supposed to respond and how they do so in reality.

For the purpose of this article, I want to take a look at the aviation industry as our example to the SOC. They have navigated many of the problems SOC operators face today and have made the most progress in governing and managing the ergonomics of sensory overload and automation. Picture this: the inside of an airplane cockpit with all its knobs, buttons, lights, and alerts isn’t too dissimilar to the combined dashboards SOC analysts have to navigate when triaging, investigating, and responding to threats.

In 1988, The Washington Post reported on a “glass cockpit” syndrome in the aviation industry, that reads eerily similar to what many say or think about the SOC today. Researchers from the American Psychological Association noted that pilots would “fall victim to information overload and ignore the many bits of data pouring from myriad technical systems,” and that in airline crashes they studied it was found that “black box recordings showed that the crews talked about ‘how the systems sure were screwed up’ but did not verify what was wrong. In both cases, the systems worked but crews failed to check the information and crashed.”

Similarly, research published in 2001 by the Royal Institute of Technology examined “the alarm problem” in aviation, meaning, “in the most critical situations with the highest cognitive load for the pilots, the technology lets you down.” The reports noted that “the warning system of the modern cockpits are not always easy to use and understand. The tendency is to overload the display with warnings, cautions and inoperative system information accompanied by various audio warnings.” It went on to identify one of the main problems as a result of this overload as “a cognitive problem of understanding and evaluating from the displayed information which is the original fault and which are the consecutive faults.” Sound familiar? You would likely hear something extremely similar from someone working in today’s SOC.

In the decades that followed, aircraft cockpit design has progressively applied new learnings and automation to dynamically manage alert volume and the attention of the pilot to priorities. In the Royal Institute of Technology’s report, researchers identified accident simulation as an effective tool for improving cockpit alert systems, finding more associable ways to present alerts such as differentiating sounds and the introduction of context, which would allow pilots to “immediately understand what part or function of the aircraft is suffering a malfunction.” More context would also include guidance on what to do next. In its conclusion the study noted:

Such simulations would hopefully result in less cognitive stress on behalf of the pilots: they would know that they have started to solve the right problem. They would not have to worry that they have entered the checklist at the wrong place. With a less stressful situation even during malfunctions there is greater hope for correct actions being taken, leading to increased flight safety.

SOC systems need to embrace and apply many of these same learnings that have spanned decades for aviation. The majority of the cybersecurity industry seems to have only gotten as far as color coding alert and warning significance, leaving the analyst faced with a hundred flashing red priorities, even after triaging it. It’s no surprise that they’re both overwhelmed and unable to respond to complex threats across a broadening attack surface.

Beware of Autopilot

When it comes to solving the issue of alert fatigue, automation is typically one of the first things to come to mind. The same went for aviation in 1988, where the previously mentioned Washington Post report quoted researchers saying what could have been taken right from a security trade publication in 2022:

Research is badly needed to understand just how much automation to introduce — and when to introduce it — in situations where the ultimate control and responsibility must rest with human operators, said psychologist Richard Pew, manager of the experimental psychology department at BBN Systems and Technologies Corp. in Cambridge, Mass.

“Everywhere we look we see the increasing use of technology,” Pew said. “In those situations where the operator has to remain in control, I think that we have to be very careful about how much automation we add.”

The growing use of high-tech devices in the cockpit or on ships can have two seemingly contradictory effects. One response is to lull crew members into a false sense of security. They “regard the computer’s recommendation as more authoritative than is warranted,” Pew said. “They tend to rely on the system and take a less active role in control.” Sometimes crews are so mesmerized by technological hardware that they are lulled into what University of Texas psychologist Robert Helmreich calls “automation complacency.”

And while automation of course has an important part to play in incident response and investigation — just as it does in modern aircraft cockpit design — it comes with some key warnings:

  1. Situational awareness is lost. Automation is often brittle, unable to operate outside of the situations it is programmed for, and subject to inappropriate performance due to faulty sensors or limited knowledge about a situation.
  2. Automation creates high workload spikes (such as when routine changes or a problem occurs) and long periods of boredom (in which attention wavers and response to exceptions may be missed). If you’re staffing for automation-level activities, how do you manage capacity for spikes?

The SOC Earns its Wings

As an industry we have to take a page from the aircraft handbook and avoid increasing cognitive demands, workload and distractions, and make tasks easier to perform. But we must also understand how to manage automation failure and exceptions better.

  • Embrace AI and autocomplete: Like the more advanced sentence autocomplete functions appearing in email and word processing applications, SOC analysts are still in charge of managing an incident, but there is an opportunity to further guide and preemptively enrich a threat investigation, thereby increasing the speed and robustness of response.
  • Distill and prioritize at the incident level, not the alert level: It’s not about filtering/correlating/aggregating alerts, it’s about contextualizing both events and alerts in the background and only articulating an incident in plain single-sentence language. Analysts can double-click down from there.
  • Leverage a community of experts: As attack surfaces increase and vertical technology specialization becomes tougher for in-house SOCs to cover (particularly in times of competing incident prioritization), it becomes increasingly important to be able to “phone-a-friend” and access an on-demand global pool of expert talent. It’s like having several Boeing engineers sitting in the cockpit with the pilot to troubleshoot a problem with the plane.

-- Gunter Ollmann

First Published: Medium - March 29, 2022