DFIR and GRC are the Business: Systems Modeling and Metrics for the Real World

Is our Pepsi more trustworthy than their Coke? What are all these security metrics and audits really saying? It's one thing to know what they are individually, the relationship between them however will change as fast as your org changes. It's easy to assume org chart maps to processes effectively. What once might have been useful has no guarantee in the future. We'll see Event Storming as a tool that lets you understand these interactive parts from the ground up, cutting through assumptions and drifted concepts.

Event Storming, a collaborative systems modeling process radiates information. Over 2 sessions it projects a coherent business narrative, the engine that drives your security operations value, where and how the metrics relate to each other, and importantly ties GRC framework requirements together in a way that makes sense for your business - not the cookie cutter organization one might imagine is happening behind the scenes or projected by outside forces.

Session 1:

Initial Timeline of Events (20 - 40 mins)

So, what is a system? A system is a set of interconnected parts that produce their own pattern of behavior over time. Buffeted, constricted, triggered, driven by external events. But the system has it's own way of responding, and discovering what that is, or at this moment in time starts with the individuals doing the work.

We start with collecting these people in a room, real or virtual and ask them to put down as many orange sticky notes on a timeline that describe relevant events to them -- event here has a special meaning, a past tense domain level activity. After 20 - 45 minutes we have a fragmented timeline something like figure 1.

Figure 1: Chaotic timeline of events

Classic properties at this stage are duplicated events, similar events with different terms, and non-domain level events. Non-domain level events, things like 'JIRA ticket assigned from queue', 'Timesheet completed', '1-on-1 scheduled' and so on can safely be placed in a corner of the workspace labeled 'Graveyard'. Duplicated and diverse vocabulary across events are the raw material that drive this exercise and will start to be addressed in the defrag step.

Defrag the timeline (20 mins)

Here groups of events are positioned chaotically on the timeline. The next step is to begin defragging them, by selecting some pivotal events. Events that look similar are a good place to start, these can be placed on top of each other. Pivotal events allow for easier sorting of events in between them and often good markers of domain boundaries as we'll see later. Usually they are associated with the clusters of duplicated events and changes in vocabulary - The vulnerability management team handing off things to the patching team, an analyst escalating an asset they have evidence of a compromise on etc.

Swim-lanes are often a good idea for parallel events, and some kind of start marker and GOTO can be useful for repeated or triggered events. At some stage it can be worth introducing external systems, things like hosted email or a retained IR firm, traditionally these are on light pink rectangular notes. These represent whole sections of events that we have little control over, they don't need to be external to the organization. To borrow a definition from Alberto Brandolini, "An External System is whatever we can put the blame on" [1]. These can highlight the degree of influence over your security operations processes, that balance between expertise, cost, and flexibility for future decisions.

Temporal milestones can help too, collecting evidence to investigate for triaging alerts might be about 20 mins or so, containing an incident might take a few hours to deploy additional monitoring and more hours to understand the breadth of TTPs involved, eradication could be weeks or months waiting for the right time to bring down business critical systems. These can be marked at the top of the timeline as they come up.

Finding missing events

So far we've been discussing actually existing events. At this point it would be unusual not to have ideas for other events that should happen, or possibly no longer needed events. It's a good idea to use a different color to add in or mark for removal. A great trick for discovering missing events is to work backwards from the timeline. Each event should have everything it needs from the proceeding events or triggers. If you've triaged an alert there better have been some collection of evidence to support a compromise and so on. Events shouldn't appear out of thin-air. There will also be uncertainties and concerns around events, add these in with magenta notes as well moving forward.

Add Value Markers and Existing Metrics (20 mins)

By now we have a coherent narrative, but without value it's not business relevant. Value produced by security operations can be produced in a variety of ways. Disrupting a threat actor takes place at the point of where all their persistence is first identified across the org, and then removed. IR analysts seeing recognition as their work leads to intelligence products is another good one. There are as many types of 'currencies' there are costs, like stress or interrupted sleep from a page. Often one side of an event will be a loss for one party but generate value for another. Closing a false positive alert might be a loss for the analyst, but let's the threat hunter know where the compromise isn't and engineers know there pipelines are serving data - or at least it could do. This can be full of fruitful conversations and an opportunity to look at how to generate more value overall.

Now is also a good time to start mapping out where indicators and metrics sit. When things are going well what does that look like? In classic systems thinking the levers here will become clear but the direction to pull them unintuitive. Much of this comes from lags in feedback which you also want to try and mark in your now coherent business narrative, figure 2. What does a rising average number of sources of evidence used for triage mean for case spin up time? Are the lawyers keeping our median time to eradicate flat or is it fed by some other input?

If the group have questions about a what is producing value, consuming inputs or really producing a metric now is a great time to jump into the discussion. Make sure to record the question on a magenta note.

These notes and the external systems become great targets for full and frank discussion. You'll want to leave space for these as much as possible, it's a discovery workshop and in some cases these delicate conversations can produce fantastic outcomes.

Figure 2: Coherent Business Narrative

Session 2:

Clarify Domain Boundaries and Tidy Up (20 - 40 minutes)

Having slept on it in all likelihood everyone will want to shuffle some events around, maybe whole sections are missing or not as relevant as first thought. Encourage and add in some extra time to account for this.

During the process you may have started to identify areas of the business flow that fit together, can naturally be implemented independently and have a clear purpose. Start labeling these as you see them, no need to wait until this step. The technical term for these are bounded contexts. It's at this step that you want to consider these specifically and have the group start tidying them up.

Bounded contexts will often have their on language, seeing similar things from different perspectives. People in teams will also naturally be grouping around a section of the whiteboard. It's not unusual to find them not too different to Richard Bejtlich's 5 groups [2] - Infrastructure and Development, Applied Threat Intelligence, Incident Detection and Response, Constituent Relations and so on. These can be broken down further to things like vulnerability identification, intelligence product creation and so on. These become neat units that you can form parcels of work around. The terrain of digital forensics and incident response that may or may not line up with the map laid out by your current organizational structure.
At this point the system dynamics are starting to emerge. Often very few events will produce value directly, often entire teams will not produce value directly. They will however set up downstream events for success -- for this they will be critical, and now questions around where these contributions show up can start to form if they are not captured already. We also have a number of magenta notes around. These highlight the most critical areas to focus on, generally the redder the area the more critical. Forming a powerful systems level backlog; the biggest constraint at the top and everything else together below, ready to shuffle around after the big system constraint is dealt with.

Add Compliance Tokens (25 - 40 mins)

Now the GRC team members you invited to the session really get to shine. To this point they may have been gaining an increasing sense of confidence and ease, or might have been busy with the magenta notes. Both are great outcomes. Day 2 is a great time to bring compliance to the fore. We have actually existing processes visible and can show how they naturally fit into compliance requirements - the dream that a well designed system is of course compliant and not bolted on steps to whatever was going to happen anyway is within reach.

One approach to put everyone at ease is to take the list of compliance and like the domain boundaries circle and mark events critical to compliance. Taking NIST[3] as an example where and when does RC.CO-03 Communicate recovery activities and progress enter the picture? Are DE.AE-02 Potentially adverse events analyzed and DE.AE-03 Information is correlated from multiple sources related in a meaningful way, how does the correlation happen today and so on. We jumped ahead and included a few of these in figure 2 already.

At Triangle Wave we can help you set these up and place you on the front foot at audit time, linking compliance requirements to events and dynamically mapping them to reports

Lastly with a shared context right in front of key players, it's a great opportunity to vote on areas the look like the most critical to work on. Give everyone a couple of anonymous tokens and have them put them down on the events they believe warrant the most attention.

The Return on 3 Hours of Investment

After two sessions with key parties there are all sorts of soft deliverables, closer relationships, a sense of being on the same team rather than rivals. More concretely expertise becomes visible, we have key people in the room with a shared model, bottlenecks are identified and ranked, and through a shared story on the page inconsistencies are quickly identified.

Altogether we're removing silos, decreasing cost of consensus, removing invisible things to break and identifying clear shared goals. The whole group is able to work and think at the system level.


[1] Brandolini, Alberto. Introducing event storming. LeanPub, 2021.

[2] Bejtlich, Richard. The practice of network security monitoring: understanding incident detection and response. No Starch Press, 2013.

[2] https://www.nist.gov/news-events/news/2024/02/nist-releases-version-20-landmark-cybersecurity-framework

Next
Next

Defensible Options for MITRE Coverage