You’re sold on the idea of observability with OpenTelemetry and excited to bring it into your organisation. Excellent. But what now? Where do you start? Tyk recently caught up with Adrianna Vallella, Cloud Native Computing Foundation (CNCF) Ambassador and Senior Staff Developer Advocate at ServiceNow Cloud Observability, to discuss precisely that.
In this blog, we’re covering:
- Learn how to approach your OpenTelemetry (OTel) strategy for maximum success.
- Understand the role of an observability practices team.
- Dig into system landscapes in the right order.
- Discover instrumentation best practices.
Watch the full webinar recording below, or read on for our top takeaways.
Develop your OpenTelemetry strategy
Adrianna emphasised the importance of an overall strategy when it comes to OpenTelemetry (OTel). She outlined three steps to achieving that strategy:
- Communication
- Understanding your system landscape
- Instrumentation
These are the cornerstones behind a successful OTel rollout – one where you don’t end up with a bunch of little kingdoms each doing their own thing.
Step 1: communicate
A little advocacy goes a long way. Rolling out OTel across an entire organisation is a huge initiative. That means you’ll need to communicate that you want to use it, obtain leadership buy-in and let people know that it’s happening (directives from leadership are super helpful here – as is an executive sponsor of your OTel rollout strategy).
OpenTelemetry is something to shout about. Announce it through email, town halls, Slack or whatever collaboration tool tickles your organisation’s fancy. Over-communication will serve better than under-communication here.
Start by clearly explaining what OpenTelemetry is: a vendor-neutral observability framework that lets you ingest, transform and export your data to whatever system you would like to analyse it. It allows you to correlate three key telemetry signals (traces, metrics and logs), which its predecessors did not, supporting OTel’s long-term success.
Benefits of OpenTelemetry
Be sure to communicate the benefits of OTel to your organisation. These include:
- No vendor lock-in: it’s easy to switch to a different vendor without re instrumenting your code.
- Vendor neutrality: you can easily send telemetry data to multiple observability backends simultaneously, making it easy to undertake a side-by-side vendor comparison.
- Active project status: OpenTelemetry is the CNCF’s second most active project behind Kubernetes. That’s no small feat.
- Standardised framework: backed by most major observability vendors, OTel is here to stay.
Recruit OTel champions
To introduce OTel successfully, recruit some champions within your business. You can form an observability practices team for advocacy, creating practices around implementation and growing internal OTel subject matter experts. The team can work closely with your engineers to help them dig into OpenTelemetry and experience the benefits.
Your champions don’t all have to be OTel experts from the outset. A mix of individual contributors and managers will serve you well. Expertise will develop over time; what’s important initially is enthusiasm.
Create an OpenTelemetry rollout plan
Milestones, dates and deadlines will help keep your OTel implementation on track, so set these out in a rollout plan. Timelines need to be realistic, so consult with engineers, managers and your observability practices team to identify your critical path and ensure everyone is on board with what can be achieved and by when. (Just remember, there’s no such thing as a perfect plan, so be prepared to learn and pivot as you move forward.)
Step 2: understand your systems landscape
For your OTel rollout plan to work, you’ll need plenty of context. Your application code probably comprises multiple services, so you need to identify each service and dig in to take stock of the language it’s written in. Doing so means you’ll be able to determine which OTel library (or libraries) your dev team needs to use. If you’ve got code written in Java, for example, you’ll want to instrument your OTel code with the Java API and SDK.
You’ll also need to inventory any third-party frameworks and libraries you’re using. OTel instrumentation is available for many popular libraries and frameworks – Flask is auto-instrumented for Python, Hibernate for Java and so on. Don’t forget about any homegrown libraries, too.
Identify critical transactions
Next, it’s time to identify your most critical high value transactions, as you’ll want to instrument these first. Why? Well, according to OTel co-founder Ted Young, it ensures that complete traces are being created, so you can start to investigate and troubleshoot important issues early. You don’t have to wait for the entire organisation to complete its migration. This is crucial.
Look ahead to migration
You’ll also need to find out if any code has already been instrumented and, if so, what instrumentation frameworks are being used: OpenTracing, OpenCensus, something you’ve written yourself or a vendor-specific library.
If you’re using something like OpenTracing or OpenCensus, which are backwards compatible with OpenTelemetry, make sure you plan to migrate over to OTel eventually, to take advantage of all that it offers. And if you’re using homegrown libraries for frameworks, be prepared to re-instrument your application using OTel.
Identify metrics sources
Along with application tracing data, you’ll want to send your metrics data to an observability backend for a holistic system view. This means you need to identify your metrics sources: Kubernetes, Kafka, Docker, Nomad, virtual machines and so on. Remember to identify what kinds of application metrics you want to capture, too.
Step 3: instrument
You’re now ready to start instrumenting with OpenTelemetry. Good work. These recommended instrumentation practices should help:
- Put app features on hold: consider delaying planned features while you instrument your code or reevaluate what’s already been instrumented, particularly if your reason for introducing OTel is frequent reliability issues. Sort the observability and reliability first, then add features once customers can rely on your app.
- Auto-instrument with caution: auto-instrumentation allows OpenTelemetry to instrument your code using wrappers that are available for certain languages. It’s a low barrier to entry but it can also over-instrument, making troubleshooting harder, so check the libraries being auto-instrumented are the ones you need to collect instrumentation from.
- Instrument as you code: observability-driven development is the act of adding instrumentation as you write your application code. It means you know exactly what to instrument (as the code is fresh in your mind) and prevents new technical debt, as you won’t have to go back and instrument your code later. Remember that your application developers know their code best – so ensure they are instrumenting their own code to avoid future headaches. Don’t make the mistake of letting a third party do it!
- Use OTel collectors: these are the preferred way of sending your telemetry data to your backend(s). The collector acts as a central point for collecting and processing data from multiple sources. It can undertake data transformations, add and remove attributes, provide data masking, do batching to avoid bombarding your backend and more.
Reach out to the community
Rolling out OpenTelemetry isn’t a trivial task, but the three steps above can help steer you through the process. Support is also available from the CNCF Slack and, if you’re rolling out OTel with Tyk, from the friendly Tyk team.