How to reduce API latency and optimize your API

As an e-commerce shopper on the hunt for a product, when you search for anything, your request is sent over to the application’s API gateway. This gateway is responsible for connecting with numerous backend services, including product catalog, user authentication, and inventory. And here’s the amazing thing—all that connection happens seamlessly and within a fraction of a second to a few seconds. The time it takes to bounce between these services and return a response to you is called latency.

To ensure a positive user experience, minimizing API gateway latency is key. OpenTelemetry (OTel) is a framework that does just that. It offers standard instrumentation libraries, APIs, and observability tools. These help developers track and understand application performance, especially in setups like microservices. For instance, in an e-commerce app connected to multiple backend services via an API gateway, OpenTelemetry provides insights into how the various services interact, their latencies, processing steps, and any bottlenecks within the application.

In this tutorial, you’ll learn how to reduce API latency issues and optimize your APIs using OpenTelemetry and the Tyk Gateway API. You’ll also learn about some best practices that can help you conquer your API gateway latency issues.

What is API gateway latency?

API gateway latency refers to the amount of time it takes for an API gateway to go through the various processing stages within the gateway, aggregate or compose the final response, and send it. Latency is the result of various factors, including network communication, request processing, authentication, authorization, the volume of data being transmitted, response times from backend servers, and communication times between services in a microservices setup.

Consider an e-commerce platform again. In this context, the various services not only boost API functionality and security but also add extra processing steps that affect response time. The wait time you experience determines if your application is high latency or low latency. Low-latency applications provide quick responses with minimal communication and processing time. In contrast, high latency means longer response times, often resulting in a poor user experience.

High API gateway latency can be attributed to three main factors: network issues, inadequate server resources, and unoptimized code:

1. Network issues: This encompasses slow or unreliable internet connections, considerable geographical distance between the API gateway and various microservices, network congestion during peak usage, and delays in DNS resolution. These factors collectively extend the time it takes for the API gateway to traverse the network and prepare a response.

2. Inadequate server resources: When an API gateway server or backend service lacks necessary resources, like RAM, storage, CPU, and network speed, issues arise. This can lead to delays in tasks like authentication, authorization, request validation, data retrieval, and processing—all resulting in higher latency.

3. Unoptimized code: This introduces problems such as excessive processing time due to redundant computations (ie unnecessary loops or inefficient algorithms), memory leaks, blocked operations without asynchronous handling, lack of data caching, inefficient data formats, and poor resource usage. It can also hinder horizontal scaling efforts, affecting performance as traffic increases.

These factors collectively contribute to higher API latency and an inferior user experience.

Observe and measure API gateway latency

Now that you understand what API gateway latency is, dive into how you can set up your application or microservices to gather telemetry data. With this data, you are able to observe and identify what’s causing latency in your API services. Observability refers to the ability to use data produced by a system, application, or infrastructure to indicate its performance or current state. This data is usually captured in the form of logs, metrics, and traces:

* Logs are records of events in a software system, including errors, file access, and warnings.

* Metrics are numerical values used to measure system performance, such as CPU usage, memory, and bandwidth.

* Traces record the journey of a request through an application, showing processing times and interactions with services. Traces help you pinpoint errors and slow areas, especially in distributed systems. Individual operations within a trace are called spans (*ie* database operations or API calls).

Solution architecture and prerequisites

In this tutorial, all traces will be captured from the application and transferred to the OpenTelemetry Collector. Once this detail gets to the Collector, you’ll be able to push it to an observability backend:

 

 

Before you begin, you need to familiarize yourself with the following:

* Instrumented microservice/app: This refers to the various microservices you may be running in your environment. These can be written in any language of your choice. This guide uses a simple REST API service for a parcel delivery application, written in Go and intentionally made to simulate a high-latency application. This will be instrumented to send traces to the OpenTelemetry Collector both with high latency and low latency.

* OpenTelemetry Collector: You need an installed and running instance of the contrib version of the OpenTelemetry Collector. If you’re familiar with Docker, you can easily get started using the Docker instance by downloading the package built for your respective operating system from the project’s Releases page. Later, you’ll provide your instance with a YAML configuration file that tells the Collector how to receive, process, and export data to the observability backend of your choice.

* Observability backends: This will be the observability platform of your choice, where you’ll visualize the incoming data from your instrumented application. This tutorial will use Logz.io, TelemetryHub, and SigNoz Cloud.

Set up the sample app

To begin, grab the sample application from this GitHub repository and set up a Go development environment. This repository has three branches:

* The `basic-app` branch holds the application with high latency and is not instrumented.

* The `instrumented` branch is an instrumented version of the basic app with high latency introduced using random wait times between operations.

* The `main` branch is an instrumented version of the API without the high-latency simulation.

This repository also contains a small program written to generate enough POST, PUT, and GET traffic to the API service so that you can have enough trace data to visualize on the respective backends. You can also find all the YAML configuration files for the respective backends in the repository.

Clone the repository using this Git command:

 

$ git clone https://github.com/rexfordnyrk/opentelemetry-instrumentation-example.git 

 

Then check out to the `basic-app` branch with this command:

 

$ git checkout instrumentated

 

Now, you’re ready to follow along with the instrumentation process.

Instrument the app

To start collecting information from your API service, you need to instrument your service using the respective OpenTelemetry libraries for the programming language of your service. Here, the application is written in Go using the Gin web framework and the GORM ORM for database interactions. The otelgin and otelgorm packages will be used to instrument the application.

To find other available libraries, tracer implementations, and utilities for instrumenting your tech stack, check out the OpenTelemetry Registry. In this article, you’ll learn about the implementation of automatic, manual, and database instrumentation to help you better understand the performance of your database queries.

Automatic instrumentation

Automatic instrumentation enables you to easily collect data from your app by importing the instrumentation library and adding a few lines of code to your application. This may or may not be available and supported depending on your tech stack. You can check the OpenTelemetry Registry to find out. In this case, it’s partially supported.

The `opentelemetry.go` file contains the code for the instrumentation of the app, and the following `initTracer()` function from that file shows you how the tracer is initiated:

 

go

func initTracer() func(context.Context) error {

 //Setting the Service name from the environmental variable if exists

 if strings.TrimSpace(os.Getenv("SERVICE_NAME")) != "" {

 serviceName = os.Getenv("SERVICE_NAME")

 }

 //Setting the Collector endpoint from the environmental variable if exists

 if strings.TrimSpace(os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")) != "" {

 collectorURL = os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")

 }

 //Setting up the exporter for the tracer

 exporter, err := otlptrace.New(

 context.Background(),

 otlptracegrpc.NewClient(

 otlptracegrpc.WithInsecure(),

 otlptracegrpc.WithEndpoint(collectorURL),

 ),

 )

 //Log a fatal error if exporter could not be setup

 if err != nil {

 log.Fatal(err)

 }

 // Setting up the resources for the tracer. this includes the context and other attributes

 //to identify the source of the traces

 resources, err := resource.New(

 context.Background(),

 resource.WithAttributes(

 attribute.String("service.name", serviceName),

 attribute.String("language", "go"),

 ),

 )

 if err != nil {

 log.Println("Could not set resources: ", err)

 }

 //Using the resources and exporter to set up a trace provider

 otel.SetTracerProvider(

 sdktrace.NewTracerProvider(

 sdktrace.WithSampler(sdktrace.AlwaysSample()),

 sdktrace.WithBatcher(exporter),

 sdktrace.WithResource(resources),

 ),

 )

 return exporter.Shutdown

}

 

In this code, in the `main.go` file, the `initTracer()` is called to create a trace provider that is injected into the router as a middleware with this line of code:

 

go

//Adding otelgin as middleware to auto-instrument ALL API requests

router.Use(otelgin.Middleware(serviceName))

 

This automatically traces all the API requests made to the service.

Manual instrumentation

In this use case, automatic instrumentation is limited as it only works for the entire request and may not necessarily capture details pertaining to the individual operations or processes of your application. However, this isn’t true of all languages. For instance, manual instrumentation requires that you specify sections of your application code that you may want to instrument.

Navigate back to the `opentelemetry.go` file. You’ll notice another function there, one that accepts the context, an action, and a name string as values and returns an error:

 

go

// ChildSpan A utility function to create child spans for specific operations

func ChildSpan(c *gin.Context, action, name string, task func() error) error {

 //Setting up a tracer either from the existing context or creating a new one

 var tracer trace.Tracer

 tracerInterface, ok := c.Get(tracerKey)

 if ok {

 tracer, ok = tracerInterface.(trace.Tracer)

 }

 if !ok {

 tracer = otel.GetTracerProvider().Tracer(

 tracerName,

 trace.WithInstrumentationVersion(otelgin.Version()),

 )

 }

 savedContext := c.Request.Context()

 defer func() {

 c.Request = c.Request.WithContext(savedContext)

 }()

 //Adding attributes to identify the operation captured in this span

 opt := trace.WithAttributes(attribute.String("service.action", action))

 _, span := tracer.Start(savedContext, name, opt)

 // Simulate delay in operation

 time.Sleep(time.Millisecond * time.Duration(randomDelay(400, 800)))

 //running function provided in span to add to trace

 if err := task(); err != nil {

 // recording an error into the span if there is any

 span.RecordError(err)

 span.SetStatus(codes.Error, fmt.Sprintf("action %s failure", action))

 span.End()

 return err

 }

 //ending the span

 span.End()

 return nil

}

 

This function creates a span under the current request context to trace the execution of the function provided. The action and name strings are used to create attributes to identify the operation that the span was created for. If there is an error during the execution of the provided function, the error information is added to the span, and the span is closed or stopped.

Inside the `router.go` file, you’ll notice that the POST and PUT request handlers have all calls sent to the `generateFee()` function:

 

go

//running the generate fees call in a span to see how long this external service takes

if err := ChildSpan(c, "Generate Fee", "Generate Fee Call", parcel.GenerateFee); err != nil {

 c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to generate fee"})

 return

}

 

This function makes an external API call to simulate the generation of the delivery fee based on distance and parcel size.

 

Database instrumentation

You can also track database calls using the otelgorm package. In the `main.go` file, you’ll see the following lines under the database setup, which adds the instrumentation plugin to the database connection:

 

go

//Adding otelgorm plugging to GORM ORM for db instrumentation

if err := db.Use(otelgorm.NewPlugin()); err != nil {

 panic(err)

}

 

After that, every call to the database is modified to use the request context to create spans in the current request trace. This is done by calling the `WithContext()` method of the `db` object and passing the request context. For example, see the following line:

 

go

//Saving record to database 

db.Model(&parcel).Updates(&updatedParcel)

 

The preceding line is now updated with this:

 

go

//Saving record to database

db.WithContext(c.Request.Context()).Model(&parcel).Updates(&updatedParcel)

 

This gives you insight into your database operations to identify nonperformant queries that may tend to increase latency.

Measure API gateway latency

Now that your app is instrumented, you need to configure your Collector to send the data to the respective observability platform of your choice. While it’s possible to send data to more than one backend at the same time, here, you’ll configure the Collector for each service separately. More information about the YAML configuration file for the OpenTelemetry Collector is available in the official docs.

Here, you’ll measure the latency of the API, first simulating a high-latency situation and, second, measuring low latency.

This tutorial assumes you’re running the sample app, your OpenTelemetry Collector, and the traffic generator on the same machine. You should change the endpoint URLs in the code if this is not the case.

Logz.io

Logz.io is an observability backend built on open source technologies, including OpenSearch for logs analytics, Jaeger for trace analytics, and Prometheus for metric analytics. In addition, Logz.io provides a cloud SIEM to help you effectively monitor, detect, isolate, and analyze security threats on your cloud infrastructure. This is a good tool to use if you’re already familiar with these technologies but would prefer to have everything in one place.

You can sign up for a free trial and go through the Get started wizard to obtain the right configuration for your Collector instance.

Once you have the right configuration for your instance, start the API server with the following command:

 

bash

$ go run .

 

Then from another terminal, change directories to the traffic directory and run the traffic program with the following command to start generating some traffic and trace data to Logz.io:

 

bash

$ cd traffic && go run .

 

In about five minutes (or less), you should start seeing the graph populated on the Jaeger UI of the traces section of your Logz.io dashboard:

 

Logz.io traces observability with high API latency

 

As you can see, the traces are filtered to show the last 500 traces in the last 15 minutes and sorted by the longest first. In this case, you’ll see that the trace with the longest time is 1.95 seconds. Clicking on that trace gives you a breakdown of all the spans:

 

High-latency trace details

 

Notice how the Generate Fee Call span takes most of the execution time. Additionally, beneath that, you’ll see that the database span gorm.Create is captured with various details, including the full query and the execution time. This helps you easily identify that Generate Fee Call may need further investigation and optimization.

TelemetryHub

TelemetryHub is a full-stack observability tool built from scratch that is solely based on OpenTelemetry. As such, it aims to be the single most affordable destination for all your OTel data. It’s powered by a clean UI with a responsive chart and visualization, almost in real time. Additionally, TelemetryHub offers all this as a service, so you don’t have to worry about setting up your own infrastructure.

You can sign up for a free account to get started.

After you’ve signed up, you need to set up a service group (if it isn’t automatically created). This provides you with an ingestion key that you need for your Collector’s configuration file to export your OTel data. You can use the Collector Setup page to obtain a basic working YAML configuration for running your Collector.

To start the API server and run the traffic generator to help generate enough traffic for the TelemetryHub dashboard, you’ll use the same process as before. This is what it should look like:

 

TelemetryHub traces observability dashboard with high API latency

 

In TelemetryHub, you’re provided with a visual representation of the performance of your API as well as which endpoints take the highest time to fulfil requests. Beneath the graph is a traditional tabular structure with filters for the various trace records that have accumulated. Here, they’re sorted by the longest-running trace first, which is 3.6 seconds. Clicking on that trace takes you to the insights page, where the span is expanded:

 

High-latency trace detail screen on TelemetryHub

 

On the insights page, you’ll find details for each span in the trace, both for the Generate Fee Call span and the database query, along with the time they took to complete.

SigNoz

SigNoz is an open source observability solution that enables you to have all your observable components, such as APM, logs, metrics, and exceptions management, in one place, with a powerful query builder for exploring it all. You also have the ability to set up alerts triggered by various thresholds or conditions if they do arise.

Unlike Logz.io, SigNoz can be self-hosted either within your local infrastructure or somewhere in the cloud. But if, for various valid reasons, you’d prefer not to manage an installation by yourself or within your infrastructure, you can always sign up for the SigNoz Cloud.

Once you sign up for the SigNoz Cloud, you’ll receive an email with a URL to your SigNoz instance, your password, and an ingestion key. You can use the YAML configuration in this documentation to configure and run your OpenTelemetry Collector instance. Be sure to replace the `<SIGNOZ_API_KEY>` with the ingestion key you received via email. Additionally, set the `{region}` to the region in your instance URL.

Just like the process described in the Logz.io section earlier, run the API server and then start the traffic program to generate enough traces for your SigNoz dashboard. In a few minutes, on the Traces page, you should have an output similar to this:

 

SigNoz Traces observability dashboard with high API latency

 

In this image, the traces are filtered to show only the traces in the last fifteen minutes and sorted by the longest first. In this case, you’ll notice that the trace with the longest time is 3.02 seconds. Clicking on that trace gives you a breakdown with all the spans under it:

 

High-latency trace detail on SigNoz

 

You’ll notice that clicking or selecting any of the spans in the trace provides more details about the span on the right pane of the page. Since the gorm.Update span is selected, the details are shown on the right. Again, you’ll find that the Generate Fee Call span takes a long time to complete.

Troubleshooting high latency

Up to this point, you’ve learned how to export the trace data collected using OpenTelemetry to a few observability platforms or tools for visualization and analysis. When combined with metrics and logs data, you can use this information to identify and enhance performance and latency within your application, as outlined in the following steps:

* Monitor API performance: Capture and observe the journey of API requests from the API gateway across the different services of your application and analyze the traces. This helps you identify which stages of processing are contributing the most to high latency. This includes the various request/response times, error rates, and the behavior of different components within the API gateway.

* Check server resources: Single out and observe the metrics of resource usage, such as CPU, memory, and disk usage on the server running the API gateway. Ensure that resources are not overutilized. You can compare resource consumption with expected values to determine if the server has enough capacity to handle peak traffic loads.

* Analyze the network: From the visualization of your system metrics data, you can identify traffic patterns from the network, such as network latency, packet loss, and peak traffic times. You may further analyze and check for DNS resolution delays and also ensure that data transmission is efficient between clients, API gateway, and backend interservices communications.

* Check database performance: Database queries are known to be a possible bottleneck contributing to high latency. You need to incorporate OpenTelemetry to trace database interactions and evaluate the performance of any databases or storage systems involved. Check query execution times, connection pools, and response times to ensure they’re not causing delays in data retrieval.

* Profile and optimize code: Trace segments related to the various processing steps of your application per transaction or request and identify where code execution is taking longer than expected. Use profiling tools for your respective programming language to focus optimization efforts. In this case, you would use Go’s built-in profiler.

* Test third-party services: Certain aspects of your application may be dependent on various external or third-party services and APIs, such as payment gateways and notification services (*ie* emails, SMS, push notifications). With OpenTelemetry’s distributed tracing, you can easily examine traces related to interactions and response times between the API gateway and third-party or external services and APIs. If any of these services exhibit high latency, it can impact your overall API performance.

* Perform load testing: Before identifying and after fixing potential issues, perform load testing using tools like Grafana Cloud k6, Apache JMeter, or Locust to simulate a realistic number of concurrent users. This helps generate varying traffic loads to identify latency issues and verify that latency issues are resolved after you’ve applied the necessary fixes. Analyze the traces and metrics to ensure that the latency issue has been addressed and that the system performs well under expected load.

By leveraging OpenTelemetry’s observability capabilities, you can gain deep insights into the behavior of your application’s API gateway and backend services, irrespective of the platform or tool you may be using to visualize and analyze the data. Using this approach, you can systematically identify and address the sources of high latency by making informed decisions about optimizations, resource allocation, and code improvements, ultimately leading to improved API performance, low latency, and a better user experience.

Resolving high API gateway latency

After troubleshooting and identifying the flaws in the application with the insights provided by the various telemetry data, your application should record better performances and latency results. Following are the results of the traces obtained from the Parcel application after cleaning up the code. Make sure you switch to the `main` branch of the repository to run and capture trace data.

Logz.io

After a few minutes of pushing traffic to the API and sending trace data to Logz.io, your dashboard should look like this:

 

Logz.io showing low-latency traces

 

You can clearly observe that most of the PUT and POST requests are now well under 600 milliseconds, which is less than a second. This is an improvement in response time compared to the previous captures.

TelemetryHub

Similarly, after a few minutes of sending data to TelemetryHub, your dashboard will look like this:

 

TelemetryHub dashboard for low latency

 

You’ll notice the average response time is 255.57 milliseconds. The highest latency recorded after sorting the records is 742 milliseconds, which is still much better than before.

SigNoz

Again, traces are fed to the SigNoz backend, and the following information is generated on the dashboard:

 

SigNoz dashboard showing low latency

 

As you can see, there has been a lot of improvement in the response time of the API service and, therefore, lower latency. The highest latency recorded here is 653 milliseconds. You can explore these further from your dashboard.

For most applications, these results are awesome, but in some other cases, you won’t be able to achieve these results and will need to look at other options for improving API gateway latency outside your code.

Performance issue workarounds

Latency can be purely performance-related, which means there are a few things you can do to improve the situation:

* Colocate resources: Placing the API gateway and the backend microservices that are data-intensive or frequently communicate with each other in close physical proximity, such as in the same region, data center, or even machine (when possible), is a good step. This can help reduce latency by eliminating the network overhead of communicating across long distances or different geographic regions.

* Modify authentication: If the API gateway is using authentication, you can modify the authentication scheme to use a less computationally expensive algorithm, more efficient protocols, or optimized token validation to help reduce the latency of API requests that require authentication.

* Disable cache encryption: Caching usually significantly improves API performance by serving cached responses from memory rather than reprocessing requests. Regardless, encrypting cached data adds computational overhead, undertaking the process of encryption and decryption. You can disable cache encryption to reduce the latency of API requests, especially if they don’t contain any sensitive data, by eliminating the time it takes to decrypt the cached responses.

* Keep serverless functions warm: Serverless functions are created and destroyed on demand, which can introduce latency when an API request is made to a function that is not currently running (aka a cold start). You can keep serverless functions warm by preprovisioning them or by using a service like the AWS Lambda provisioned concurrency or the Azure Functions Premium plan. This can help ensure that API requests are responded to quickly, even when the function is not currently running and improving latency for users.

You can effectively mitigate API gateway latency and enhance the overall performance of your application by carefully applying these strategies in accordance with the specifics of your microservices-based environment.

Alternative approaches to address high latency

Aside from performance, you may also need to implement other solutions targeting latency from an infrastructure perspective:

Implement a service mesh

A service mesh, such as Istio or Linkerd, is a dedicated infrastructure layer. It can help improve communication between microservices and reduce latency by offloading traffic management tasks from individual services, eliminating the need for microservices to communicate directly with each other. Service meshes can also provide features, like load balancing, fault tolerance, and observability, that can further improve performance.

However, service meshes can be complex to implement and manage as they require changes to existing services and add an additional layer to the architecture. A service mesh demands careful configuration and management, potentially leading to a higher learning curve, and it can add to the total cost of ownership (TCO) of a microservices architecture.

Set up edge computing/networking

Edge computing is a distributed computing paradigm that brings computation and data storage closer to the end user, eliminating the need for data to travel long distances to be processed. Edge computing can also be used to cache frequently accessed data, which can further improve performance.

However, edge computing infrastructure can be expensive to deploy and manage, and it can add complexity to the overall architecture. Ensuring a consistent application state and data synchronization across edge nodes can be challenging.

Employ methods like request batching and asynchronous processing

Request batching allows multiple requests to be sent together, improving throughput and reducing latency by reducing the number of round trips between microservices. Additionally, asynchronous processing allows requests to be processed in the background, which can help to improve latency for interactive requests.

However, request batching and asynchronous processing can introduce logical complexities and can be difficult to implement correctly. It also requires changes to how clients interact with the API and how the API processes batched requests.

Build your own web server

Building your own web server can give you more control over the performance of your application. Servers can be optimized and configured for your specific needs, eliminating unnecessary features and bottlenecks present in generic web servers. This leads to achieving faster response times and reduced latency.

However, building your own web server can be time-consuming and require significant development effort. It also places the responsibility of security updates and compatibility with existing libraries and frameworks on your team.

It’s important to weigh the benefits and challenges of each workaround before deciding which one is right for your application.

Conclusion

In this article, you learned about what API gateway latency is and how high latency can result from factors such as sluggish backend services, processing overhead due to authentication and authorization steps, as well as network issues like distance-related latency, inefficient code execution, inadequate server resources, and chatty communication patterns. You also learned about observability and the role OpenTelemetry plays in that.

Additionally, you learned how to instrument your backend services with OpenTelemetry to help collect trace, metric, and log data for any observability backend via the OpenTelemetry Collector. In the process, you learned how to troubleshoot the data you obtained and gain insight into the performance of both your application and its infrastructure.

Tyk is an open source enterprise API gateway packed with all the features you need for microservice-based application needs. It supports almost all industry authentication and authorization standards, including OpenID Connect, JSON Web Token, bearer tokens, and basic auth. It also lets you operate on any communication protocol of your choice, including REST, SOAP, GraphQL, gRPC, and TCP, with the fastest and lowest latency possible in any API gateway in the world.

Tyk comes packed with must-have API gateway features, such as content mediation, API versioning, rate limiting, granular access control, and IP whitelisting and blacklisting. You can even test these features out with a free trial instance or talk to an engineer today to get started.