Incident Response - Lessons learnt from mountaineers
When caught in an avalanche, survival chances are strongly dependent on time. After 18 minutes, the survival rate is still 90%. After half an hour, it drops to 30%. 18 minutes can feel like an eternity when you're "buried" under several meters of snow. However, for survivors, it's a minimal timeframe during which they need to recover from the shock, regroup, organize, determine the burial point and depth, dig, and extract the victim from the white hell.
Similarly, in 18 minutes, a cybersecurity threat that has successfully infiltrated the Information System had the time to spread and propagate in almost irreversible proportions. Thus, it is within this tiny window that the detection team must identify, assess, and define immediate actions.
There's no safe mountain.
Avalanches, crevasses, and rockfalls aren't limited to high peaks. Any significant snowfall can trigger a slide of several meters' thickness when the slope exceeds 30°, every glacier creates a crevasse at its base, and permafrost melt as well as erosion lead to rockfalls regardless of altitude.
Just as you must navigate crevasses, seracs, and exposed paths when climbing, the Information System must also expose itself to maintain its efficiency and role. The key, in both contexts, is accurately assessing and preparing for the risk.
Reinhold Messner said, "The mountain is neither fair nor unfair, it's dangerous." Similarly, the Information System isn't uninteresting or safe; it's exposed. It holds resources and important data for the company or administration. Data that holds value. Therefore, there's no uninteresting Information System.
Tooling
Climbing a peak without proper equipment is inconceivable, just as intrusion detection requires suitable tooling. In either case, there's no magic tool. Effectiveness comes from the right combination of functions and technologies that fit the environment's specific needs.
Encrypted flows that can't be properly analyzed by a network probe might need to be examined on the target. False positives generated by machine learning engines - which add confusion when time is crucial - can be mostly filtered upstream by deterministic engines. On the other hand, Machine Learning, can effectively identify attack scenarios, even if evasion techniques (like command padding, delay insertion between operations, or the use of distinct sources) are leveraged.
However, certain essential rules need to be kept in mind. First, we're talking about functions and technologies, not tools. And the proliferation of tools comes with numerous downsides.
The first is the learning curve. Mastering a larger number of tools naturally takes longer. More tools lead to less expertise in their use.
The second is tied to data consistency. Each tool uses its own format, taxonomy, or language. Integrating them cohesively with centralization tools like a SIEM becomes more complex with the number of tools used, and exponential when integrating them for event correlation. Correlation efficiency is consequently reduced; pivot points can only be common information, limited to the least common multiple of elements homogeneously provided by the various tools.
Lastly, multiplying tools makes securing them challenging. A falling ice axe from a backpack is deadly for the second climber. PsTools or even a Python interpreter left available to an attacker can be turned against defenders. Securing tools is essential, and complexity increases with the number of tools involved.
Organization and Communication
Within the 18 minutes following an avalanche, one must regroup in a safe place (to avoid the risk of secondary avalanches), designate a leader, set Avalanche Rescue Beacon (ARB) to receive mode, establish a first point of contact, grid search the area with depth probes, locate the burial point, evaluate the snow depth, and then extract the victim.
In such a short timeframe, an effective and well-practiced organization is essential.
When initial signs of intrusion are identified, it's possible that the entire Information System is already compromised, including communication systems. Gathering in a "safe zone" then involves using third-party channels with tools and non-corporate user accounts, thus entirely independent from the Information System. In other words, making good use of shadow IT is necessary.
An intrusion situation is a state of emergency that requires mobilizing all available resources for specific actions. This is why naming a leader is essential, and their instructions must be followed without debate, negotiation, or political or career considerations. This leader should be experienced and technically skilled. Making quick decisions in this domain requires knowing what you're doing. Stress resistance is also crucial, typically gained through experience.
Lastly, the main foreseeable incident scenarios need to be known. In mountaineering, it's avalanche, falling into a crevasse, or rockfall; in cybersecurity, it's exploiting a vulnerability, compromising an account, or opening a malicious email.
For each scenario, immediate actions and subsequent steps to track the intrusion's propagation need to be identified in advance. This isn't a time for improvisation: without consistent and proven organization, success chances (i.e., survival in the mountains) are significantly reduced.
Preparation
All previously discussed subjects must be meticulously documented. Compile an up-to-date list of colleagues who can act in an intrusion context and their contact information on a secure channel. Catalog available tools, their operation modes, and access methods. Establish detection procedures and immediate actions for different scenarios.
The importance of documentation and updates can be illustrated with a simple example. A group of mountaineers has a satellite phone (an alternative communication method for areas without network coverage) secured with a PIN code (tool security), noted in a notebook (documentation). The owner falls into a crevasse and becomes unconscious. If the PIN code isn't noted or has been changed, the phone becomes unusable, and rescue efforts are hindered.
Preparation also involves necessary training, both to develop automatic responses and to reduce the stress impact inherent to such situations. Regular practice is essential to minimize the execution time of various steps. While assembling the team on a secure channel might initially take over an hour, a dozen repetitions might reduce this to under 5 minutes.
Regular training also empirically (and effectively) verifies the documentation's accuracy, which is vital.
Conclusion
Whether dealing with a mountain incident or an ongoing intrusion in an Information System, time is the key factor, and in an incident context even the smallest error can lead to failure. In this hostile environment, success relies on well-practiced organization, reliable communication, and appropriate and mastered tooling.
There is no reason to improvise for what can be defined in advance; however experience and real mastery of responders remains mandatory. The Murpy's law is universal.