Telemetry
Build telemetry into your code from day one to turn systems into evidence—and move from debugging to decision-making.
You can’t lead a system you can’t hear.
At Staff+ level, you’re expected to design observability from day one. You build features and the evidence that proves they work.
John, of course, “has logs.” Somewhere. On a machine no one can SSH into. He swears they’re “verbose.” But don’t be John.
We already discussed how observability helps you see systems at scale. Now, we’ll zoom in on how to build that visibility into your code from day one through telemetry.
What is telemetry?
Telemetry is your system’s way of talking back to you. It’s the automatic collection of data about what your software is doing, so you can answer:
What just happened?
Who did it?
Did it succeed or fail?
How often does it occur?
Telemetry usually takes three forms:
Logs: Event records (“notification sent”)
Metrics: Measurable numbers (
, latency, error rate)DAUs Daily Active Users Traces: Step-by-step journeys of a request across services
A feature without telemetry is like a plane without a cockpit: you’re basically flying undetectable.
The telemetry flow
Building telemetry isn’t about sprinkling console.log statements and hoping for insights. It’s about creating a closed feedback loop—data that flows from your system to your dashboards, gets validated, and drives decisions.
Here’s the Staff+ way to design that loop:
1. Decide what questions you’ll need answered
Before you log anything, decide what “truths” your telemetry should reveal.
What will tell you this feature or service is working?
What would you need to see if it failed?
Who will consume this data: engineering, product, data science?
Example:
“Do smart notifications actually increase daily active users?”
“What percentage of notifications are opened within 5 minutes?”
This step keeps telemetry aligned with business outcomes, not just infrastructure noise.
2. Instrument events at the right places
Once you know your questions, wire up events that provide those answers—not every possible signal.
Emit logs or metrics where outcomes happen, not deep in helper functions.
Include the minimum viable context: user ID, timestamp, environment, and relevant metadata.
Version your event schemas so consumers know what to expect when they change.
Example:
Log
notification.sentandnotification.openedevents with key properties:user_id,type,timestamp.
The goal isn’t more events—it’s meaningful, stable events.
3. Validate in staging and production
Telemetry that isn’t verified is just noise. Always validate both data shape and volume.
Fire test events in staging, then confirm they appear in your observability stack (e.g., Datadog, Honeycomb, Grafana).
In production, check rates—if you expect 1,000 events/hour and see 10, something’s broken.
Add simple automated checks for event drift: missing fields, nulls, or dropped data.
Staff+ engineers treat telemetry as a system under test, not an afterthought.
4. Create (and maintain) an event dictionary
An event dictionary is simply a lightweight shared doc that lists:
Each event
What properties does it carry?
When it fires
Who owns it?
Here’s why it matters:
Prevents duplicate or inconsistent events (
notif.sent,notification_sent,NotifSentchaos).Creates accountability—every event has an owner.
Makes it easy for data teams to build reliable dashboards and experiments.
Without an event dictionary, telemetry devolves into random print statements and institutional knowledge (a very John situation).
Example: Smart notifications
Let’s apply this flow to a real scenario.
Suppose you’re building a “Smart Notifications” feature.
Question:
The north star for your telemetry design will be whatever question that leadership cares about.
In this case, “Do smart notifications increase daily active users?”
Instrumentation:
notification.sent → properties: user_id, type, time stamp
notification.opened → properties: user_id, type, time stamp
Validation:
Fire test events in staging and production
Confirm they appear in logs with correct properties
Event dictionary entry:
Event Name | Properties | Fires When | Owner |
notification.sent | user_id, type, timestamp | A notification is dispatched | Backend |
notification.opened | user_id, type, timestamp | User opens a notification | Frontend |
Now the whole team can see whether smart notifications improved DAUs (without relying on John’s secret stash of logs).
Putting it all together
Telemetry turns your code into evidence.
When you plan it from day one—defining the questions, instrumenting intentional events, validating data, and maintaining an event dictionary—you move from debugging to decision-making.
At this level, you’re not just asking “did it run?”—you’re proving “did it deliver?”
That’s what separates Staff+ engineers from log spammers (and yes, from John’s “mystery S3 bucket full of CSVs”).