How many attacks are we missing? Let's calculate it

A system must create something of value, in other words, results.
— Deming, W. Edwards

Often, in the context of deciding how well a security operations team is doing or evaluating an engineering pipeline, the topic of false negatives will come up. How many attacks are we missing? This is typically followed by the agreement that it's hard to determine, and some other, less robust approach is taken.

Instead we're going to walk through calculating false negatives—how much we have missed—in under 1,500 words.

What is really being said with the outcome of a classification? It depends on what we are trying to classify. As security professionals, it would also be nice if this classification process itself could be valuable. Two common views of this are: one, to ask if this stamp looks like some stamp we have seen before, as a specific procedure perhaps written up in ATT&CK or the Alerting Detection Strategy Framework page; and two, to ask if the asset in question is compromised. Both make similar demands of the analyst: check for unusual activity associated with the asset.

In the first case, a true positive lets us remove the rare stamp from the envelope and begin the process of determining how it was compromised, while a false positive tells us nothing about the level of compromise but assures us that our understanding of what we were looking for wasn't found.

With the second approach to classification, we draw from a known pool of indicators that would suggest a compromise, without the need to understand precisely what the original detection was looking for or the need to pull up the documentation each time one of potentially hundreds of alerts fires. This also leverages the full experience and engineering capability of the team across each alert classified, rather than just the noisy ones. A true positive here means we have a compromise, and a false positive means we can be confident the attacker is on another asset.

To get to the attacks we’re missing, we need a rough cost for the triage step going right (a true or false positive) and the triage step going wrong (a false negative). True negatives come for free — all the assets that were not compromised and that we didn't decide to triage.

Relative numbers are one way to start here. Beginning with the foundational work behind any security program, we consider how valuable an attacker perceives compromising our assets to be and how expensive it is for us to be compromised. To round out the analysis, we can also consider how expensive it is for an attacker to be caught — this speaks to the effectiveness of the security program. Their TTPs collected and disseminated, vulnerabilities patched, and so on, allow us to build trust with our customers while our competitors file a notice of breach with the SEC.0

 Scenario

 Payoff (Relative)

 One of our Assets is Compromised  - 1000 Us
 Attacker's Objective Achieved  100 Them
 We Triage and Uncompromised Asset  -1 Us
 Attacker Distrupted  100 Us

 

The table above suggests that it is 1,000 times worse to be compromised than it is to discover a false positive alert. Feel free to use different orders of magnitude if they make sense for your organization. By keeping it to orders of magnitude, we are using Fermi Estimation. It might sound hand wavy but Fermi's approach was more than enough for atomic physics, and we're much lower stakes. If you'd like to dig in further we at Triangle Wave would be more than happy to have a chat or come onsite and facilitate a workshop with your organization.

These numbers bring us very close to the normal form of a two-player, two-strategy game. One party decides which assets to compromise, and at the same time, we decide which assets to triage. This puts us in a cell on the grid below, where each party receives their expected payoffs

 Them, Us

 Triage

 Don't Triage

 Compromise

 -100, 100  100, -1000

 Ignore Asset

 0, -1  0, 0

 

What we are looking for now is a split between triaging and not triaging, where the attacker cannot do any better regardless of how many assets they decide to compromise. At the same time, the attacker wants to compromise or not compromise a percentage of assets so that we cannot achieve a higher payoff, even if we triage everything. This situation describes a Nash Equilibrium, and in this case, it is solvable. Turning the handle on the math[1], we find the optimal rate of compromise at equilibrium to be about 1 in 1,000 assets[2].

The first step with this new baseline is to check how we're doing at the aggregate level. Say, for some time period, there were 1,000 alerts per month and 10,000 assets in our organization. Of these, there were 2 attackers discovered — i.e., 2 true positives. The other alerts were false positives or benign, if you prefer.

Immediately, we can find our expected rate: 1 in 1,000 of 10,000 assets, and notice that we are missing visibility of 8 threat actors expected to be active in our environment. This first finding indicates something needs to change.

With some fruit from our analysis, we can go further and understand how recent changes might have impacted this month, by how much, and what changes might do to future months. To do this, we can put together a confusion matrix and, using the payoff matrix we already have, work out the expected value[3].

Expected Value = (True Positive Payoff) x (True Positive Count) + (False Positive Payoff) x (False Positive Count) + ... + (False Negative Payoff) x (False Negative Count)

In our case

Expected Value =  (100) x (2) + (-1) x (1000 - 2) + (0) x (10,000 - 1000 - (10 - 2)) + (-1000) x (10 - 2)
= -8798

A negative expected value means the business suffers a net loss from running the security operations team for the month. A positive value means the team is generating a positive return on investment — they are a profit center. The number on its own might not have a concrete unit, but it is clear that we should make some changes to our hypothetical organization. Next month we might try automating a few key triage questions, verifying our work when this number moves a little or a lot relative to fluctuations in the past.

The same approach can be applied to individual detections or hunts over a period of time. By taking the expected value generated by a hunt or detection, you can rank them in a tornado chart to highlight the most successful ones to learn from, and identify the ones that are crying out for tuning. Not bad for an approach that doesn't require calculating billions of gradients.

If it’s working out the embedding method for your slick AI machine, a killer piece of regex, or you’re looking to compare the relative value for each considering the time and effort involved in keeping those systems running, the same approach works for both.

Here, we have started by cataloging detections, like collecting stamps, and moved to delivering value with each alert triaged, even if it's a false positive result. By aggregating the numbers together, we can estimate how well the security program is doing and track how things change over time. At the individual detection level, we have a clear signal for what to tune and what to keep.

A tornado chart showing expected value from different detections

[1] I'd encourage you to try this with some different values yourself and get a feel for variability in triage outcomes and the associated scores. If you like linear algebra you can build your own solver or grab a ready made one from the web.

[2] Interestingly we also discover there are gains to triaging up to half of our assets over a given time period of time, before it starts to cost us more than it's worth.

[3] Provost, Foster, and Tom Fawcett. Data Science for Business: What you need to know about data mining and data-analytic thinking. " O'Reilly Media, Inc.", 2013. Chapter 7

Next
Next

DFIR and GRC are the Business: Systems Modeling and Metrics for the Real World