CanDIG revolutionizes national-scale genomic research with Tyk's secure, federated API management

1M+ genomic
records

Tyk enables secure access and management of over a million sensitive genomic data points.

Federated across
multiple institutions

APIs connect research centers nationwide while maintaining strict data governance.

Enterprise-grade
security

Tyk ensures protected access with fine-grained authentication and authorization controls.

Region

Global

Sector

Healthcare

Product

Self-Managed

At a glance

Company

CanDIG – Canada’s Distributed Infrastructure for Genomics – is a national federated health research data platform, connecting distributed national datasets, and connecting researchers to a platform where they can discover and explore data.

distributedgenomics.ca

Key features used

CanDIG is Canada’s solution to health data analysis. It facilitates national, distributed, analysis of locally-controlled private genomic data. The CanDIG platform enables clinical researchers from distributed sites to query and analyze quality data, with no need for a central infrastructure to trust, maintain, or secure. By supporting sharing of research-quality consented health research data, CanDIG is facilitating the efficient and effective diagnosis, treatment and follow-up of health conditions including COVID-19, oncology, and rare diseases.

The platform is a peer-to-peer federation, directly connecting the centres to each other, with no centralized infrastructure. Coordination happens at the level of software, policy, and data standards development. CanDIG ensures that the sites control their own data, which translates into distributed authentication and authorization decision-making, informed by platform-level information; the concern for local control of data extends to ongoing interest in privacy-preserving methods and privacy by design. All data access is API-based – even local data. This enables fine-grained logging and auditing, as well as the potential for fine-grained authorization. It also allows for the abstraction of back-end data stores.

Because CanDIG is developed by a small team, it relies wherever possible on standards for reusability and interoperability, best practices, and open-source software. “CanDIG is building an open-source, standards-based infrastructure to power truly national-scale Canadian genomics health research projects,” comments Amanjeev Sethi, Senior Application Developer at UHN. “The data supported by the CanDIG platform is part of multiple national projects, whose use is each governed by the sites involved in each of the projects. Our partnership with various sites allows for users across the country to analyse national-scale data while maximising privacy and keeping it under local control. This lets Canadian-scale research programs expand, and makes it easier for new projects to begin.”

The problem

CanDIG needed its platform to be fully distributed. That meant no central identity or central authorization authority. Authorization needed to be made locally, based on local policies and informed by platform-wide CanDIG services. Any site that needed to provide data had to be able to verify any such remotely provided information.

This made for a more complicated internal structure than that of similar sharing platforms. However, with the right tools in place (and yes, we mean Tyk!), the structure delivers far greater flexibility as a result of being designed in this way.

Not only that, but it needed to address the question of how much data a user could see, as well as which datasets.

“Datasets that a user does not have row-level authorization for might still be queryable for aggregated results or for computations such as training models. In CanDIG, we have been building out infrastructure since the beginning of the project to authorize differentially-private aggregations to data to allow data custodians to make some datasets accessible for calculations without necessarily exposing the data directly to researchers”, explains CanDIG’s Amanjeev Sethi.

The role of Tyk

As an API gateway, Tyk routes requests to the correct services but this isn’t all that it’s doing for CanDIG. Tyk is also acting as the relying party for the OAuth2/OIDC protocol, thus serving as an authenticating reverse proxy. It checks authentication and identity tokens for CanDIG’s controlled-access endpoints before beginning a session and passing on requests.

As CanDIG started implementing additional services, it needed a uniform way of authorizing requests being processed locally across an increasing number of APIs.  It developed an approach based on a rules-based policy engine, Open Policy Agent (OPA).

“As in many healthcare environments, when we consider data access and authorization decisions as they apply to research projects the level of access varies across users, or in our case, researchers. The benefit, but also complexity that is embedded within CanDIG is the autonomy granted to each site to make decisions on who is permitted to access what data. As a nation-wide platform, it is important for us to be able to uniformly enforce data access policies. Collecting entitlements and then evaluating specific requirements at policy engine allows us to do just that.” Samantha Palmer, Health Data Policy Specialist for CanDIG, at UHN.

But that policy engine required consistent delivery of local and platform-level authorization-relevant information with each request.  Based on the CanDIG team’s experience with Tyk’s middleware, the team further extended Tyk to also perform claims marshalling when a session begins. It performs entitlement claim lookups from Vault for local entitlement information and Data Access Committee portal tool REMS for plattform-wide information, and serves them with the request where they can be used to make authorisation decisions by OPA.

The solution

CanDIG – Canada’s Distributed Infrastructure for Genomics – is a decentralized federation across multiple healthcare and health research institutions across Canada.  CanDIG connects health research genomics data from cross-Canada projects, and allows researchers to access and explore national distributed consented health data sets.  Tyk’s open source gateway provides the entrypoint to the stack.

This diagram shows the CanDIGv2 AuthN/Z stack. Choosing to rely on existing, well-tested open source packages (with commercial support available) to implement the stack, implementing only the pieces needed specifically by CanDIG.

The CanDIGv2 AuthN/Z stack relies on each site’s Keycloak instance to provide uniform OIDC/OAuth2 to the site’s existing identity management; Vault to securely store local entitlement information; Tyk to be the OAuth2 relying party and to marshal entitlements; and Open Policy Agent to evaluate a request against site-provided policies and the marshalled entitlements.

The future

As the number of data services grows, and the queries performed across them become more sophisticated, CanDIG is planning to use GraphQL queries across services.  “GraphQL is perfect for allowing complex queries, while not over-returning data unnecessarily,” said University of Toronto MScAC student Siyue Wang, who is prototyping machine language queries across multiple CanDIG data services. Tyk’s Universal Data Graph is a very promising way that the team is examining for exposing those queries in a uniform way across APIs and sites.

In addition, differentially-private federated learning methods that the team is examining will require remote procedure call support rather than ReST or GraphQL APIs. “These more tightly coupled calculations will require a different approach than the ReSTful approach CanDIG has relied on in the past,” said Rishabh Sambare, Waterloo University B.CS. co-op student and part of the UHN CanDIG team.  “We’ll need to move to remote procedure calls to support analysis of the data in this way”. Tyk’s gRPC proxying support would allow a single API gateway to support all three of these methods – essential for a small team.

Finally, the CanDIG team has already extended Tyk using its polyglot middleware support.  They have written their middleware plugins in JavaScript but are now looking at gRPC, which will offer better performance and writing language-agnostic middleware plugins.

Related customer stories

SmarTone

SmarTone transforms API management: from manual inefficiency to unified control​

Rouvy

How ROUVY scaled from 1M to 21M monthly API requests with Tyk's lightning-fast gateway​

Kalamuna

How Kalamuna turned a fragile legacy system into a rock-solid API ecosystem, powered by Tyk.​

Start for free

Get a demo

Ready to get started?

You can have your first API up and running in as little as 15 minutes. Just sign up for a Tyk Cloud account, select your free trial option and follow the guided setup.