Telemetry

Build telemetry into your code from day one to turn systems into evidence—and move from debugging to decision-making.

You can’t lead a system you can’t hear.

At Staff+ level, you’re expected to design observability from day one. You build features and the evidence that proves they work.

John, of course, “has logs.” Somewhere. On a machine no one can SSH into. He swears they’re “verbose.” But don’t be John.

We already discussed how observability helps you see systems at scale. Now, we’ll zoom in on how to build that visibility into your code from day one through telemetry.

What is telemetry?

Telemetry is your system’s way of talking back to you. It’s the automatic collection of data about what your software is doing, so you can answer:

  • What just happened?

  • Who did it?

  • Did it succeed or fail?

  • How often does it occur?

Telemetry usually takes three forms:

  • Logs: Event records (“notification sent”)

  • Metrics: Measurable numbers (DAUsDaily Active Users, latency, error rate)

  • Traces: Step-by-step journeys of a request across services

A feature without telemetry is like a plane without a cockpit: you’re basically flying undetectable.

The telemetry flow

Building telemetry isn’t about sprinkling console.log statements and hoping for insights. It’s about creating a closed feedback loop—data that flows from your system to your dashboards, gets validated, and drives decisions.

Here’s the Staff+ way to design that loop:

1. Decide what questions you’ll need answered

Before you log anything, decide what “truths” your telemetry should reveal.

  • What will tell you this feature or service is working?

  • What would you need to see if it failed?

  • Who will consume this data: engineering, product, data science?

Example:

“Do smart notifications actually increase daily active users?”
“What percentage of notifications are opened within 5 minutes?”

This step keeps telemetry aligned with business outcomes, not just infrastructure noise.


2. Instrument events at the right places

Once you know your questions, wire up events that provide those answers—not every possible signal.

  • Emit logs or metrics where outcomes happen, not deep in helper functions.

  • Include the minimum viable context: user ID, timestamp, environment, and relevant metadata.

  • Version your event schemas so consumers know what to expect when they change.

Example:

Log notification.sent and notification.opened events with key properties:
user_id, type, timestamp.

The goal isn’t more events—it’s meaningful, stable events.


3. Validate in staging and production

Telemetry that isn’t verified is just noise. Always validate both data shape and volume.

  • Fire test events in staging, then confirm they appear in your observability stack (e.g., Datadog, Honeycomb, Grafana).

  • In production, check rates—if you expect 1,000 events/hour and see 10, something’s broken.

  • Add simple automated checks for event drift: missing fields, nulls, or dropped data.

Staff+ engineers treat telemetry as a system under test, not an afterthought.


4. Create (and maintain) an event dictionary

An event dictionary is simply a lightweight shared doc that lists:

  • Each event

  • What properties does it carry?

  • When it fires

  • Who owns it?

Here’s why it matters:

  • Prevents duplicate or inconsistent events (notif.sent, notification_sent, NotifSent chaos).

  • Creates accountability—every event has an owner.

  • Makes it easy for data teams to build reliable dashboards and experiments.

Without an event dictionary, telemetry devolves into random print statements and institutional knowledge (a very John situation).

Example: Smart notifications

Let’s apply this flow to a real scenario. 

Suppose you’re building a “Smart Notifications” feature. 

Press + to interact
Smart notifications telemetry flow: From notification dispatch to user interaction and data logging
Smart notifications telemetry flow: From notification dispatch to user interaction and data logging

Question:

The north star for your telemetry design will be whatever question that leadership cares about. 

In this case, “Do smart notifications increase daily active users?”

Instrumentation:

  • notification.sent → properties: user_id, type, time stamp

  • notification.opened → properties: user_id, type, time stamp

Validation:

  • Fire test events in staging and production

  • Confirm they appear in logs with correct properties

Event dictionary entry:

Event Name

Properties

Fires When

Owner

notification.sent

user_id, type, timestamp

A notification is dispatched

Backend

notification.opened

user_id, type, timestamp

User opens a notification

Frontend

Now the whole team can see whether smart notifications improved DAUs (without relying on John’s secret stash of logs).

Putting it all together

Telemetry turns your code into evidence.

When you plan it from day one—defining the questions, instrumenting intentional events, validating data, and maintaining an event dictionary—you move from debugging to decision-making.

At this level, you’re not just asking “did it run?”—you’re proving “did it deliver?”

That’s what separates Staff+ engineers from log spammers (and yes, from John’s “mystery S3 bucket full of CSVs”).

Ask