Wargaming: running incident response plan tests - part 2
The time has come to run your wargame, let's look at what happens on the day itself.
In part one of this series I discussed the importance of running wargames to test your plans, and looked at some of the tools and considerations. For part two I'll be reviewing what happens on the day, and highlighting some issues you might have.
On the day preparations
Firstly it's important to test your tool again - as I said before, if you've ever done a live demo you know how easily things can go wrong. Make sure your tool works, and that you can detect its traffic (assuming you have access to do so)[1].
Depending on the scenario it's likely that you want to start the tool running shortly before you're due to kick off the exercise.
Assemble the troops!
Pre-pandemic we'd likely have assembled everyone at one office for the wargame. Since lockdown and social distancing makes that problematic I've had to find other methods to get together (who hasn't?!) and have run a couple of successful exercises via Google Meet. However you gather, make sure everyone is together for the kick off briefing.
Once everyone's arrived and been welcomed remind them that this is not a test of their ability. Anxiety is possibly running high at this point so put the team at ease. I also find it's best not to run the exercise as a competition - the aim is to see how the team works together, so highlight that intention. Mention that part of the exercise is to help teach people the skills they need and to find out what training is needed. Reiterate that no question is a bad question.
Also reiterate, again, that this is not a test (yep - really drive that point home).
Once that's out of the way, check everyone is comfortable and kick off with the briefing. It's at this stage that we explain everybody's roles and outline the rules (example rules below). Roles can include the facilitator (responsible for running the scenario), technicians / engineers / analysts (who will be investigating the incident), senior management (responsible for decisions), communications (they'll communicate with the organisation / public).
Example rules
- In the event we find a real live incident we will stop this exercise and immediately work on the live incident
- If anyone needs to take a break then we can do that (but try to keep this as real as possible)
- You must tell the facilitator before you make any system changes. They will stop you if your actions would cause an (unnecessary) outage
- Discussion is encouraged
- The expected timescale for this exercise is 2 - 4 hours, but we will stop at 1 day if the threat has not been neutralised
Set the scene
Now everyone knows how this will work we can set the scene. Recently I've run simulations designed to mock up an infected device on the network that's communicating with a command and control (C2 or CnC) server, so that's what I'll base this series on.
Throughout the simulation the facilitator will provide injects which provide the responders with new information. At kick off the information is likely to be quite vague - the team is only just learning of the issue. For example:
National computer emergency response teams have identified an infection on networks that connect out to a malicious host every minute. Further information will be provided as it becomes available.
It was with an incredibly vague starter like the above that I kicked off our recent simulations.
Let them dig around
As a result of being very vague it's likely the team will be looking in the wrong places or have so much information to review that they cannot find the "infected" device you've planted somewhere on the network. There's nothing wrong with letting the team look around - at the very least this allows them to get familiar with the logging and networked environments. In the interests of not wasting anyone's time, I'd recommend not letting people follow red herrings[2] for too long.
Another advantage of an exercise like this is that you can identify who lacks access to what systems. Perhaps someone has recently joined the team but not been given access to the firewall logs yet. If you're able to grant this access during the exercise it's worth doing so (make sure you follow any change control processes!).
The team might feel paralysed at this early stage because there's so little information to work on. Don't hesitate to provide hints and discuss what people are thinking about. Again, the intention of this exercise is for people to learn and gain confidence.
Provide the next injects
After the team has had a look around it's time to provide the next inject, which will provide further information. Exactly how long you wait between injects is entirely up to you, as the facilitator, so it's important to "read the room" to gauge how the team is feeling. For that reason it's really important to maintain good communication with the team. Expect to sit quietly in the background, and just listen, more than you talk.
With subsequent information from injects the team should start to be headed in the right direction. Depending on the scenario it may be that additonal changes have to be made by the facilitator. Perhaps the "infection" spreads or escalates in some way. Make sure you're familiar with the scenario so you know what to do and when.
I'll end this post with our incident response team headed in the right direction. Join me in part three where we'll locate the infection, I'll discuss what to do when things go wrong, and explain how we finish up.
Banner image: Screenshot from Wargames (c) MGM / United Artists Entertainment 1983.
[1] If you don't have access to test the tool is working as intended it'd be a really good idea to enlist the help of someone that can check. Preferably someone that isn't involved in the simulation.
[2] A red herring is a plausible but incorrect hint or avenue of enquiry. Read more on Wikipedia.