Tyk Gateway Observability Best Practices

Tyk Gateway exports metrics and traces via the OpenTelemetry Protocol (OTLP) and writes logs to stderr for external collection. This page covers production-ready configuration for each signal: which export topology to use, how to control metrics cardinality, how to tune trace sampling, and how to correlate logs with traces.

Exporting Data via OTLP

Use OTLP as the export protocol. It is vendor-neutral and supported by every modern observability backend. The Gateway supports both gRPC (default, more efficient for high throughput) and HTTP transports. Choose gRPC unless your network or backend requires HTTP.

Export Topology

Direct to backend: simpler setup, works well for managed cloud backends (Datadog, Dynatrace, New Relic, Elastic Cloud). Suitable for lower traffic volumes where buffering and retry are handled by the backend. Via OTel Collector: recommended for production. Decouples Tyk from the backend, adds buffering and retry, enables tail-based sampling (see Trace Sampling), and fans out to multiple backends simultaneously.

Open Source Backends

Backend	Best for	Notes
Grafana LGTM (Loki, Grafana, Tempo, Mimir)	All signals	Tempo for traces, Mimir/Prometheus for metrics, Loki for logs
Jaeger	Tracing only	Accepts OTLP natively since v1.35; simple to self-host
Prometheus + Grafana	Metrics only	Pull model; use OTel Collector’s Prometheus exporter or `remote_write`
ELK / OpenSearch	Logs + APM	Elastic APM accepts OTLP; Logstash can ingest OTel Collector output

For vendor-specific configuration (Datadog, Dynatrace, New Relic, Elastic, Jaeger), see the Traces configuration guide.

Metrics: Cardinality and Performance

What Is the Cardinality Problem?

Each unique combination of dimension values for a metric creates a separate time series. Unbounded dimensions, such as one label per user, per IP address, or per request ID, cause exponential growth in series count. This consumes memory in Tyk Gateway, increases storage costs in your backend, and slows query performance.

Tyk’s Built-In Cardinality Limit

Tyk ordinarily caps each metric instrument at 2,000 unique label-value combinations. A different limit can be set using the cardinality control. If the configured limit is exceeded, the additional combinations are aggregated into an overflow bucket marked with the attribute otel.metric.overflow=true. Aggregate counts are preserved in the overflow bucket, but you lose the ability to break down data by the overflowing dimension combination. Alert on overflow to catch cardinality issues early. The following example uses PromQL:

increase(<your_metric_name>{otel_metric_overflow="true"}[5m]) > 0

If you are hitting the cardinality cap then the limit can be raised, but the recommended action is to reduce cardinality in your custom metrics configuration.

Custom Metrics: Stay at 10 Dimensions or Fewer

The OTel SDK processes metrics on a fast path when an instrument has 10 or fewer dimensions. Exceeding this threshold increases memory allocations and slows metric recording. Keep each custom metric instrument to 10 or fewer dimensions.

Dimension Safety Guide

When defining custom metrics, choose dimension sources based on their cardinality characteristics:

Cardinality	Sources	Examples
Safe	`metadata`, `config_data` bounded fields	`listen_path`, `endpoint`, `method`, `http.response.status_code`
Caution	`session` fields, bounded JWT claims	`api_key`, `oauth_id`, `alias` (bounded per tenant but can be large)
Avoid	`header` / `context` with unbounded values	`ip_address`, `user_id` JWT claim, `request_id`, raw `path`
Never	Token or bearer values	Unique per request; exhausts the cardinality limit immediately

Traces: Sampling Strategy

Trace data is the most expensive observability signal to collect, store, and query. Sampling controls what fraction of traces you keep.

Head-Based Sampling

With head-based sampling, the decision whether to create a trace for a request is made before any span data exists. In other words, the sampling logic is applied in the Gateway. This is controlled using the TraceIDRatioBased approach as described in the trace sampling section.

{
  "opentelemetry": {
    "traces": {
      "sampling": {
        "type": "TraceIDRatioBased",
        "rate": 0.1
      }
    }
  }
}

Characteristics:

Fast: no buffering required, predictable overhead
Limitation: cannot guarantee capture of all errors or slow outliers at low sample rates
Recommended default: 10% (rate: 0.1) for most production deployments

Tail-Based Sampling

With tail-based sampling, the sampling decision is made in the OTel Collector after the full trace has been collected. The OTel Collector buffers spans in memory, then applies policies to decide what to keep. This allows you to guarantee 100% retention of error traces and slow requests regardless of the overall sample rate. To use tail-based sampling, send all traces from Tyk to the Collector using the AlwaysOn approach as described in the trace sampling section, then apply policies in the Collector. This is the default setting.

# OTel Collector processors section
processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: keep-errors
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: keep-slow
        type: latency
        latency: {threshold_ms: 1000}
      - name: sample-rest
        type: probabilistic
        probabilistic: {sampling_percentage: 5}

Sampling Trade-Offs

Strategy	Cost	Accuracy	Best for
`AlwaysOn` (100%)	High	Perfect	Dev/staging, low-traffic APIs
`TraceIDRatioBased`, `rate: 0.1` (10%)	Low	Good average	Most production deployments
`TraceIDRatioBased`, `rate: 0.01` (1%)	Very low	Poor for rare events	Very high traffic, budget-constrained
Tail-based via OTel Collector	Medium (Collector infra)	Best	When you need all errors and outliers

Logs: Collection and Correlation

Tyk Gateway writes logs to stderr, not via OTLP. They must be collected by an external agent.

Enable Access Logs

Access logs record one line per request: HTTP method, path, upstream latency, status code, response size, and client IP. They are the fastest way to observe Gateway traffic without a full tracing setup and are essential for log-based alerting such as 5xx spike detection. Access logs are enabled in the Gateway configuration or the equivalent environment variable:

{
  "access_logs": {
    "enabled": true
  }
}

See Access Logs for the full list of configurable fields and custom template options.

Use JSON Log Format

JSON format has lower parsing overhead than the default text format and is directly indexable by log backends (Loki, Elasticsearch, CloudWatch). Enable this mode in the Gateway configuration or the equivalent environment variable:

{
  "log_format": "json"
}

JSON log format is recommended for all production deployments.

Trace Correlation

When OpenTelemetry is enabled, Tyk injects trace_id and span_id into request-scoped log entries. This lets you pivot from a log line directly to the corresponding trace in Tempo or Jaeger.

trace_id and span_id are only present on sampled requests. At 10% head-based sampling, 90% of log lines will not carry trace IDs.

Log Collection Options

Stack	Recommended Approach
Grafana LGTM	Promtail or Grafana Alloy → Loki
ELK / OpenSearch	Filebeat or Logstash
Kubernetes	OTel Collector with `filelog` receiver; see Collecting Gateway Logs with OTel on Kubernetes
Cloud (AWS/GCP/Azure)	CloudWatch agent, GCP Logging agent, or Azure Monitor agent

Performance Impact Summary

Signal	Baseline Cost	Main Risk	Mitigation
Metrics	Low (batched OTLP export)	Cardinality explosion in custom metrics	≤10 dimensions per instrument; monitor `otel.metric.overflow`
Traces	Medium (per-request spans)	High sampling rate at scale	Head-based 10% default; tail-based for error capture
Logs	Low (stdout write)	Log volume at debug verbosity	Use `info` level in production; enable JSON format
Analytics (Tyk Pump)	Medium–High (Redis writes)	~13% RPS reduction when tracking all requests	Use Do-Not-Track middleware for non-critical endpoints

Overview

Getting Started

Deploy Tyk

Managing APIs

Security in Tyk

Reference

Developer Support

Tyk Gateway Observability Best Practices

Exporting Data via OTLP

Export Topology

Open Source Backends

Metrics: Cardinality and Performance

What Is the Cardinality Problem?

Tyk’s Built-In Cardinality Limit

Custom Metrics: Stay at 10 Dimensions or Fewer

Dimension Safety Guide

Traces: Sampling Strategy

Head-Based Sampling

Tail-Based Sampling

Sampling Trade-Offs

Logs: Collection and Correlation

Enable Access Logs

Use JSON Log Format

Trace Correlation

Log Collection Options

Performance Impact Summary

Overview

Getting Started

Deploy Tyk

Managing APIs

Security in Tyk

Reference

Developer Support

Documentation Index

​Exporting Data via OTLP

​Export Topology

​Open Source Backends

​Metrics: Cardinality and Performance

​What Is the Cardinality Problem?

​Tyk’s Built-In Cardinality Limit

​Custom Metrics: Stay at 10 Dimensions or Fewer

​Dimension Safety Guide

​Traces: Sampling Strategy

​Head-Based Sampling

​Tail-Based Sampling

​Sampling Trade-Offs

​Logs: Collection and Correlation

​Enable Access Logs

​Use JSON Log Format

​Trace Correlation

​Log Collection Options

​Performance Impact Summary

Exporting Data via OTLP

Export Topology

Open Source Backends

Metrics: Cardinality and Performance

What Is the Cardinality Problem?

Tyk’s Built-In Cardinality Limit

Custom Metrics: Stay at 10 Dimensions or Fewer

Dimension Safety Guide

Traces: Sampling Strategy

Head-Based Sampling

Tail-Based Sampling

Sampling Trade-Offs

Logs: Collection and Correlation

Enable Access Logs

Use JSON Log Format

Trace Correlation

Log Collection Options

Performance Impact Summary