Become the Highest Paid Engineer at Your Company/

...

Goldilocks Designs

Learn how to create a minimum viable architecture.

We'll cover the following...

Goldilocks: Notifications platform

As Staff+, you’re making architectural decisions that shape how fast your team ships, how painful on-call is, and how much the company spends on infrastructure.

That means balancing two opposing forces:

Under-engineering leads to fragile systems that break under pressure.
Over-engineering buries your team in complexity and slows product delivery.

Staff+ engineers are expected to steer toward the middle: solutions that are robust enough for today’s needs, and extensible enough for tomorrow’s growth.

In this lesson, you’ll learn how to apply the Goldilocks principle to System Design—through a real-world walk-through of a notifications platform that evolves from V1 to global scale.

John loves overbuilding because it makes him seem irreplaceable. You’ll do the opposite—build just enough and document why. That’s the difference between ego-driven architecture and business-driven design.

Goldilocks: Notifications platform

Let’s say you’re building a notifications platform with email, push, and SMS. How do you design it without going too far either way?

We’ll walk through four versions of the same system—from scrappy V1 to global scale—to show what the Goldilocks principle looks like in real life.

Version 1: Keep it simple

One worker process polls a database table for jobs (e.g., send email, SMS, or push). Whenever a new notification is created, it’s inserted into a notifications table. A background process (cron or worker) polls every few seconds, picks unsent jobs, sends them, and marks them complete.

Pros:
- Easy to build: One database table, one worker.
- Cheap: Runs on a single server or cloud function.
- Fast to ship: Ideal for MVPs or low-volume apps.
Cons:
- No retry logic: If an email API call fails, it’s gone.
- Performance ceiling: Polling adds latency and can overwhelm the DB under load.
- Limited visibility: Hard to track progress or debug failures.
When to use: For prototypes, internal tools, or early-stage products with low traffic.
When speed to market matters more than robustness.

Press + to interact

Version 2: Add reliability

Introduce a message queue (RabbitMQ, SQS, or Kafka Lite). Producers push notification jobs into a queue. Workers consume and process them asynchronously.

Instead of polling the DB, each new notification is pushed to a queue. Workers listen for jobs, process them, and on failure, retry (with exponential backoff). The DB remains the source of truth but isn’t constantly queried.

Pros:

Retry logic: No more dropped notifications.
Scalable: Add more workers for higher throughput.
Clear separation: Producers create jobs, consumers process them.

Cons:

More infra: Need to deploy and monitor a queue service.
Ops overhead: Dead-letter queues, visibility, and alerting required.

When to use: When you’ve hit reliability or latency issues with v1. Your system now needs to handle spikes or guarantee delivery.

Press + to interact

Version 3: Optimize performance

Introduce Redis cache for metadata and a CDN for static assets. User preferences, rate limits, and template data are cached. CDN serves static assets like logos or templates used in notifications, reducing latency. Workers fetch everything they need with minimal DB load.

Pros:

Lower latency: Cached metadata means faster lookups.
Reduced DB stress: Heavy reads move to Redis.
Optimized delivery: CDNs accelerate static content.

Cons:

More complexity: Cache invalidation is hard.
Potential inconsistency: Data may be stale.
Requires discipline: You must monitor cache hit/miss rates.

When to use: When notifications volume is in the tens of thousands per hour. You’re optimizing for speed and efficiency, not just reliability.

Press + to interact

Version 4: Global scale

Multi-region setup with active-active replication for queues and databases. Each region has its own workers close to users for low latency. Traffic is routed to the nearest region (via DNS or load balancer). Each region has mirrored infrastructure. Data replicates across regions with conflict resolution or an active-passive fallback.

Pros:

High availability: Survives regional outages.
Low latency: Notifications delivered from nearest region.
Fault tolerance: Systems keep running even during failures.

Cons:

Extreme complexity: Managing replication, failover, and consistency.
Expensive: Infra and maintenance costs grow exponentially.
Strong ops maturity needed.

When to use: When you’re a global product with strict SLAs and millions of events/day. Now you’re solving distributed systems problems, not just feature delivery.

Press + to interact

Each layer of complexity—queues, caches, global replication—adds power and cost.

So which one is in the Goldilocks zone? Any of them can be, depending on your situation.

But for most common scenarios, you might be looking at:

Version 2 for fast-growing products: it introduces queues and retries for reliability, without the overhead of full-on scale engineering.
Version  3 when you’ve proven reliability and are starting to feel performance pain. It optimizes for speed and efficiency without crossing into distributed‑systems territory.

As a Staff+ engineer, your job is to resist jumping to Version 4 until the pain is real. Your system should earn its complexity.

TL;DR: The Goldilocks principle in architecture means scaling only when the pain is real. To stay out of infra hell, start small and only scale when forced.

Ask

Version	Focus	Key Additions	Trade-Offs
1	Ship fast	Simple worker and DB	Fragile
2	Reliable	Queue and retries	More ops
3	Fast	Cache and CDN	Complexity
4	Global	Multi-region and replication	Expensive

John 2.0

Legacy Code Whisperer

System Design Like a Staff+

AI Engineering

Reliability Under Fire

Data Engineering for Product Impact

Security Without Drama

Product Sense

Multiplier Habits and Systems

Bonus: Comp Strategy and Promotion Packets

Congratulations, John 2.0

Goldilocks Designs

Goldilocks: Notifications platform

Version 1: Keep it simple

Version 2: Add reliability

Version 3: Optimize performance

Version 4: Global scale

The Goldilocks zone