Operational Resilience: Prevent and mitigate critical incidents

30th May 2023

In January 2023, 23% of UK businesses reported that they were unable to operate fully due to Operational Resilience failures such as supply chain disruption, system failures, third party outages and cyber attacks.

Putting measures in place to prevent critical incidents from occurring, to ensure your organisation is protected, and to be able to mitigate any risks that may arise is more important than ever to ensure the ongoing delivery of important business services.

How are businesses dealing with business continuity today?

For many businesses the focus of Operational Resilience is still around traditional Business Continuity Plans. This traditional view of how to mitigate and prevent critical incidents is often limited to a small number of major incidents but not on critical customer facing services.

A more proactive Operational Resilience strategy involves taking proactive measures to prevent and mitigate material harm. While businesses are typically able to react to incidents well – they know the escalation path and who needs to be consulted and informed – prevention and mitigation means there should be no surprises and all possible severe disruptions should have been pre-considered and realistic workarounds and mitigations planned.

More mature organisations are developing an integrated set of Operational Resilience capabilities and “playbooks” that help them predict potential problems and respond to critical incidents faster and more efficiently. These capabilities ensure a comprehensive approach that minimises the impact of incidents and ensures continuity of their operations.

Four questions to consider when reviewing your Operational Resilience preparedness

Before you engage in developing your Operational Resilience strategy and plan you need to address four key concerns to establish the organisation’s maturity:

  1. What is important to your clients?

Under the new FCA regulations around Operational Resilience, important business services are defined through the lens of the client. Harm is caused to clients in several ways when disruption occurs to your important business services:

Being unable to provide essential services to your clients can lead to financial losses and reputational damage for those clients due to delayed or cancelled services. They may also lose the trust of their customers, who may take their business elsewhere.

– The economic ripple effect of widespread disruption. As each client business begins to suffer delays financial and reputational losses increase, ultimately affecting the wider economy.

Falling short of regulatory standards can result in penalties and fines and this can have a knock-on effect to your clients. Causing a loss of trust, reduction in the services provided and ultimately a loss of business.

Impact tolerances refer to the level of disruption or harm that a firm can accept before it begins to cause its client’s intolerable harm. Knowing these tolerances means firms can take a more strategic risk-based approach to managing disruptions and improving their Operational Resilience. You can then decide how to prioritise investments, focus on prevention, improve incident responses, and enhance communications with clients and stakeholders.

After defining important business services and impact tolerances it is critical to then map the resources (people, volumes and metrics, technology, facilities, third parties) used to provide these products or services. Dependencies and gaps should also be mapped so you can take appropriate steps against potential risks or threats and create contingency plans to ensure continued delivery of services in the event of disruption.

  1. Is your culture in the right place?

Operational Resilience should be an integral part of your workplace culture and be embedded within your Enterprise Governance, Risk and Compliance framework. To do so there needs to be an understanding that resilience isn’t a one and done exercise. It is a constant cycle that continuously feeds information to relevant teams and individuals.

Risk is an organism that moves and shifts dependant on internal and external factors. Operational leaders need to be horizon scanning, theorising on additional or emerging risks that could have a critical impact to their business. These should all live within the enterprise risk register.

Another reason for embedding Operational Resilience into your culture is that it cannot be achieved by a small subset of the employee base. The ultimate owner of Operational Resilience will be the Chief Risk Officer (CRO). However, resilience needs to be owned by an entire organisation, feeding into a centralised vision and strategy of managing risk. Without collaboration, you will create a disjointed approach.

Resilience must be a part of the entire employee lifecycle; being part of the conversation from recruitment and onboarding through to training and professional development objectives.

For roles where Operational Resilience is a factor you need to be selecting people with the right understanding and skills. Use skills matrices to evaluate the expertise your current workforce has, determine any Operational Resilience or business services skills gaps and set interview questions to select for required skills.

Training around Operational Resilience should be focussed on elevating every member of your organisations understanding and knowledge. And senior management should have risk objectives to ensure everyone is taking resilience seriously and working towards one common goal. The most operationally resilient businesses will have all employees culturally aligned.

  1. Are you scenario testing?

A key aspect of Operational Resilience is robust scenario testing. This important tool enables firms to assess their Operational Resilience, particularly in relation to disruption to their critical services. Scenario testing involves simulating a range of severe, but plausible potential events that could impact a firm’s operation, and assessing its ability to respond and recover within defined impact tolerances.

This type of testing can help by:

Identifying vulnerabilities and highlight areas that might be most exposed to risk.

Outlining the steps needed to mitigate the impact and prioritise investments accordingly.

Identifying areas that need to be improved and refined to provide a better, faster response.

Scenario testing is now mandated by the Financial Conduct Authority’s new Operational Resilience rules. Firms are required to demonstrate they can continue to operate their important services in the event of disruption. First and foremost, testing is conducted to assess your Operational Resilience impact tolerances and your ability to remain within these thresholds.

When simulating plausible real-world scenarios of varying levels of severity, you need to be clear that all your processes have been mapped against risk factors such as people, technology, third party providers, and location. Scenario testing and exercising can also be used to test the effectiveness of any workaround processes and plans as well.

Ongoing testing should be part of your overall Governance, Risk and Compliance strategy. The results of a regular, annual cycle of Operational Resilience scenario testing can then be fed through these structures, improving the effectiveness of your Operational Resilience planning by giving the board visibility and accountability for any improvements that need to be made and the ability to prioritise business funding to address any gaps.

A SecOps (Security Operations) team plays a critical role in assessing and updating the business on the threat landscape. They play a critical role in delivering Operational Resilience to an organisation by implementing and managing security measures that ensure the continuous operation of critical business functions, systems, and data. Some of the ways SecOps teams help deliver Operational Resilience include:

Identifying system and network vulnerabilities

Responding to security incidents

Maintaining compliance to laws and regulations

Implementing security controls

  1. Do you understand your Critical Third Parties (CTP)?

Third parties often play a significant part in the delivery of products and services, so disruption to their services can have a knock-on effect to the firm’s operations. This could include third parties that provide key infrastructure (cloud computing) and supporting services, such as IT systems, payment processing or entire data centres.

The financial services industry is becoming so reliant on critical third parties to supply key functions and services that the Government issued a specific policy statement in June 2022 to address these risks:

“If many firms rely on the same third party, the failure or disruption of this ‘critical’ third party could threaten the stability of, or confidence in, the financial system of the United Kingdom.”

Source: HM Treasury, Policy Statement: Critical third parties to the finance sector

CTP will also fall under the FCA’s new guidelines for Operational Resilience. A proposed Financial Services and Markets Bill announced in an FCA discussion paper lays out a statutory framework for overseeing the resilience of services third parties provide that many financial firms rely on. Firms are expected to have appropriate processes and controls in place to manage to disruptions, including alternative providers or contingency arrangements that can be quickly implemented.

Organisations need to treat third parties with the same diligence as their internal operation.  Any risks associated with a CTP are your businesses risks and should be fully understood.Evaluating your organisation’s reliance on CTP and those third parties’ own levels of Operational Resilience are an integral step in understanding how able you are to prevent critical incidents and how you can mitigate the effects of such an incident.

Before entering into a contract with a third-party a comprehensive risk assessment should be undertaken to identify any potential risks such as cyber security, data protection or business continuity.  Service level agreements should be comprehensive and form part of the contract as should provisions for monitoring and auditing third party performance.

Third party providers should be prepared to discuss their resilience with clients, particularly as part of contract negotiations. From our own experience, clients are increasingly asking for SLAs (Service Level Agreements) to be established within contracts to ensure operational resilience thresholds are met.

Third parties also play a key part of providing alternative solutions when disruption occurs.  It is important to understand how quickly and how robust these alternatives are so they can be initiated from the appropriate part in the operational playbooks.

Have you got a proactive approach defined against critical incidents?

As experts in the field of Operational Resilience we often help businesses detail the important business services and the impact they have on their clients. Understand your current scenario modelling capabilities, define a proactive approach to critical incident preparedness and identify areas of improvement.

Get the right tools and structures in place to enable your CRO to own a Governance, Risk and Compliance  strategy that prioritises response efforts and mitigates potential risks. Through our Clarity powered by BusinessOptix tool we can even help you build the tools and structure required to improve your operational resilience maturity.

Interested in more?

This blog was written by Consulting Director David Ilett, Senior Consultant Jason Pillay, and Senior Consultant Mark Odlin.

If you’d like to read more on this topic, you can download our white paper, where we also look at:

Understanding the risks

Responding and recovering from critical incidents

8 essential questions to ask yourself to prepare for a critical incident

Download our white paper here.

    Keep up to date with Davies