How to design resilient infrastructure for modern banking

Why is resiliency so crucial in modern banking infrastructure? And how do you achieve resiliency by design? Lalitha Kagithapu, Principal Engineer at First American, and Caleb Cole, Enhanced Technology Program Manager at Wings Credit Union shared their insights on this subject as part of the LEAP 2026 conference. Read on to learn from their real-world examples, or watch the conference session in full here.  

Why is resiliency by design important? 

According to First American’s Lalitha Kagithapu, resiliency is a fundamental part of the design process, not an afterthought. It’s key to building robust API foundations and delivering trusted financial services at scale. Building it in by design means resilience becomes something application teams inherit, rather than having to configure, so they can focus on business logic. 

Wings Credit Union’s Caleb Cole adds that a resilient infrastructure is about having the people, processes, technologies, and governance in place to absorb disruptions, maintain critical member services, and recover quickly. In recent years, it’s also about cloud concentration, third-party dependencies, and AI-related risks. 

Key components for designing a resilient infrastructure 

At First American, resilient infrastructure design is about:

  • Network resiliency, with cloud providers responsible for this at the control plane and the internal team responsible for it at data plane. 
  • Ensuring all configuration is offered to the application team as templates.
  • Using Tyk as an API gateway at the edge, to support scaling and high availability. 
  • Having default and opinionated regulations for circuit breaking, rate limiting and retries. 
  • Using both an API gateway and a service mesh for dual circuit breaking, allowing for blast radius containment in the event of faults.
  • Standardizing authentication and having predictable, uniform patterns, including across legacy systems. 

All of this is underpinned by an infrastructure as code ethos, while all deployments and rollbacks are backed by data ops and completely automated. If it’s not code or a GitOps commit, it doesn’t exist. 

Resiliency metrics

At Wings Credit Union, members are the primary focus, meaning core digital information, systems, and resources must be reliably accessible to authorized users. Several metrics support this, including: 

  • Uptime of frontline systems that staff use to support members: core banking system, CRM, and support tools.
  • Recovery time for in-house critical systems. There are plans, strategies, and procedures in place to recover within two hours of an incident occurring. 
  • Vendor SLAs with performance requirements for third-party critical systems. 

Regular testing is also essential. At Wings Credit Union, that includes tabletop exercises to ensure business and IT teams can execute recovery procedures under pressure at least annually. It’s about looking at everything through a lens of availability, recovery, performance, vendor reliability, and whether the business can execute during a disruption. 

Common points of failure 

Failure is something you accept and design for. Common issues and the way to overcome them include:

  • Limited visibility. Observability is critical to overcoming this. The platform team can instrument it for application teams to inherit, ensuring it’s consistent, predictable, and available from day one. 
  • Inconsistent authentication patterns (including across legacy systems) and not having the right access logging at application level. An API edge gateway can overcome this, baking in consistency for application teams to inherit. 
  • Uncontrolled dependency chains, with APIs calling APIs calling APIs, including third-party integrations. Observability is key to ensuring visibility and resilience here, as is circuit breaking to limit any blast radius, plus rate limiting and auto retries. 

Addressing these points in this way means application teams can move to production with confidence. 

How to design resilient systems 

For resilient design: 

  • Draw on the best frameworks. FFIEC and NCUA are relevant examples for financial institutions.
  • Work with the business lines and IT, risk, and information security teams to get a holistic understanding of the business functions and what’s important. Do this by inventorying business processes and performing business impact analyses on them, so you identify what’s critical. 
  • Map those critical processes to the infrastructure (applications, identities, data), so you understand where the pathways are. 
  • Separate vendor-managed services from internally managed services, as they require different control approaches
  • Test at least annually or more frequently depending on criticality. Be honest in testing about threats, potential risks, and capabilities, so you can provide sound guidance to the rest of the business. 

Finally, remember: Resilient infrastructure is about providing trust to end users, so that technical failures don’t become business failures. 

For more thoughts on performance and resilience, with cost benefits thrown in for good measure, check out this blog. You can also contact the Tyk team to discuss your particular requirements. 

 

Share the Post:

Related Posts

Start for free

Get a demo

Ready to get started?

You can have your first API up and running in as little as 15 minutes. Just sign up for a Tyk Cloud account, select your free trial option and follow the guided setup.