Tuesday, May 5, 2020

Tackling the SDLC With Machine Learning

Businesses’ digital transformations continue to show that being relative and competitive are directly tied to the ability to develop and harness software. As the CEO of Microsoft, Satya Nadella, oft says—“every company is now a software company.”

Software flaws that lead to unintentional data leakage, cause breaches, or jeopardize public health or the environment are not only costly but may be terminal to a company’s future. Integrity and security of the software and the development processes behind them have therefore become a critical component of every organization’s success. It is a core reason CISOs are increasingly partnering with DevOps leaders and vigilantly modernizing secure development lifecycle (SDLC) processes to embrace new machine learning (ML) approaches. 

Automated application security testing is a key component of modern SDLC practices and can economically uncover many bugs and potential security flaws with relative ease. Application security testing embraces a broad range of complementary techniques and tooling—such as static application security testing (SAST), dynamic application security testing (DAST), interactive application security testing (IAST), and runtime application self-protection (RASP). Current best practice security advice recommends a mix of tools from this alphabet soup to mechanically flag bugs and vulnerabilities to mitigate the consequences of unresolved bugs that make it to production systems.

A troublesome consequence of this approach lies with the volume of identified software flaws and the development team’s ability to corroborate the flaw’s risk (and subsequent prioritization). It’s also a problem manifest in organizations that operate bug bounty programs and need to triage bug researchers’ voluminous submissions. Even mature, well-oiled SDLC businesses battle automated triage and prioritization of bugs that flow from application security testing workflows—for example, Microsoft’s 47,000 developers generate nearly 30,000 bugs a month.


To better label and prioritize bugs at scale, new ML approaches are being applied and the results have been very promising. In Microsoft’s case, data scientists developed a process and ML model that correctly distinguishes between security and non security bugs 99 percent of the time and accurately identifies critical, high-priority security bugs 97 percent of the time.

For bugs and vulnerabilities outside automated application security testing apparatus and SDLC processes—such as customer- or researcher-reported bugs—additional difficulties in using content-rich submissions for training ML classifier systems can include reports with passwords, personally identifiable information (PII), or other types of sensitive data. A recent publication “Identifying Security Bug Reports Based Solely on Report Titles and Noisy Data” highlights that appropriately trained ML classifiers can be highly accurate even when preserving confidential information and restricted to using only the title of the bug report.

CISOs should stay informed of innovations in this area. According to Coralogix, an average developer creates 70 bugs per 1,000 lines of code and fixing a bug takes 30 times longer than writing a line of code. 

By correctly identifying security bugs from what is increasingly an overwhelming pile of bugs generated by automated application testing tools and customer-reported flaws, businesses can properly prioritize their development teams’ fix workflow and further reduce application risks to their organization, customers, and partners.

Although much research and innovation are underway in training ML classifier systems to triage security bugs and improve processes encapsulated in modern SDLC, it will be a while before organizations can purchase off-the-shelf, integrated solutions. 

CISOs and DevOps security leaders should be alert to new research publications and what “state of the art” is, and press their automated application software testing tool suppliers to advance their solutions to intelligently and correctly label security bugs apart from the daily chaff.

-- Gunter Ollmann

First Published: SecurityWeek - May 5, 2020