Become the Highest Paid Engineer at Your Company/

...

AI System Design

Learn how to architect for the future.

We'll cover the following...

Key differences: AI vs. traditional systems
Example: Designing an AI-powered support assistant
Types of AI System Design
Which AI System Design skill will you need?

AI already powers search, copilots, chatbots, and internal developer tools.

At Staff+ levels, your job is to design systems around models, where reliability, safety, and cost still drive decisions.

In this lesson, we’ll cover what’s unique about AI System Design and its subvariants, and which ones you should learn.

Buckle up, we’re covering a lot (but at least we’re more thorough than any John docs).

Let’s get started.

Heads up: We’ll dive deeper into AI in the next module: AI Engineering.

Key differences: AI vs. traditional systems

To shift toward AI System Design, you’ll have to understand how AI systems differ from traditional System Design across four major dimensions:

Behavior: Deterministic execution vs. probabilistic outputs
Feedback: One-time deployment vs. continuous evaluation and retraining
Data: Static inputs vs. evolving, embedded knowledge
Control: Input validation vs. guardrails that shape and verify outputs

Let’s look at how each dimension reshapes System Design in practice.

1. Behavior: Deterministic vs. probabilistic

Traditional systems: Same input → same output. You design for correctness.
AI systems: Same input → different outputs depending on context, temperature, or fine-tuning. You design for consistency, not perfection.

Staff+ engineers anticipate and constrain this variability. They use temperature tuning, prompt templates, and response validators to make “unpredictable” models predictable enough for production.

Example: A customer support AI that must always output JSON, even when the model tries to “explain” itself.

Press + to interact

2. Feedback: Deploy once vs. continuous evaluation

Traditional systems: Ship once → monitor uptime
AI systems: Ship → evaluate → retrain → redeploy

Every production AI system includes a feedback loop, users flag bad answers, evaluation jobs score outputs, and retraining pipelines improve models over time.

Monitoring now includes accuracy, bias, drift, hallucination rates, and even cost per query. It’s not “is it up?”, it’s “is it still right, fair, and affordable?”

Example: Your chatbot answers correctly in February but starts failing in May because the knowledge base hasn’t been refreshed. You catch that with automated eval jobs and content freshness metrics.

3. Data: Static inputs vs. embedded knowledge

Traditional systems: Data is a separate concern from logic—mostly static and external.
AI systems: Data is embedded into behavior via embeddings, vector stores, and fine-tuning.

In AI systems, data pipelines are as critical as APIs or databases. Your vector database, embeddings, and fine-tuning datasets are all first-class architecture components.

As a Staff+ engineer, you'll have to:

Design ETL pipelines for cleaning and deduplication.
Automate embedding updates as content changes.
Enforce data lineage and auditability for compliance.

Example: A retrieval pipeline that automatically re-embeds documents when an internal wiki page updates, keeping context fresh and answers accurate.

4. Control: Input validation vs. output guardrails

Traditional systems: Validate inputs to prevent bad behavior.
AI systems: Guardrails must also constrain outputs, which are less predictable and harder to test.

In traditional systems, we validate schemas and sanitize inputs. In AI systems, we must also validate outputs, because models can go off-script.

Guardrails prevent unsafe, off-topic, or brand-damaging responses. These can be implemented via:

Policy layers (e.g., NeMo Guardrails, Azure Content Safety)
Schema enforcement (structured outputs like JSON)
Post-processing filters (regex or domain classifiers)

Example: A support bot that refuses to answer medical questions or escalates to a human if confidence < 80%.

Press + to interact

Example: Designing an AI-powered support assistant

Let’s look at how the four dimensions of AI System Design come together in a real-world product request—one you might be asked to lead at the Staff+ level.

Scenario: A PM says: “Can we build an AI assistant so users don’t flood support?”
Your job is to design a production-grade AI system, thinking in terms of system reliability, safety, and business impact.

Defining SLOs

First, you'd need to define success and failure:

95% of answers should come from internal docs.
Hallucination rate <5%.

These turn subjective “make it work” requests into measurable reliability goals, and give you something to evaluate post-launch.

Architectural design decisions

With these SLOs in place, your design choices reflect the unique needs of AI systems:

Retrieval-augmented generation (RAG): Ground answers in trusted documents to reduce hallucination and enable frequent updates; essential because data is your infrastructure now.
- Tools: Pinecone/Weaviate/FAISS vs just raw LLMs (GPT)
Guardrails: LLMs aren’t sandboxed, so you must sanitize outputs, not just inputs.
- Tools: NeMo, content filters, schema validators
Observability: You need metrics for answer quality, freshness, and user trust.
- Tools: LangSmith, TruLens, custom eval harnesses

For those unfamiliar: We’ll cover RAG and observability in upcoming lessons.

Business tie-in

The design decisions now map to concrete business impact.

With RAG + guardrails, we deflect 40% of support tickets.
Without them, we risk frustrated users and support escalations.

Cost per query, latency, and escalation rates become the metrics that justify your choices to product and leadership.

Press + to interact

Types of AI System Design

Different kinds of AI systems with their own respective constraints, failure modes, and architecture patterns.

At a high level, most production AI systems fall into three broad patterns:

1. Machine learning systems

These are your classic supervised learning pipelines—ranking, classification, forecasting. You train on labeled data, deploy a model, and measure performance with precision, recall, AUC, etc.

Machine learning System Design focuses on:

Data pipelines and labeling
Offline training + online serving
A/B testing and model drift detection

Dig deeper into data pipelines, training, and more in our Machine Learning System Design course.

2. Generative AI systems

These systems generate content: text, code, summaries, images. They rely on prompts, embeddings, and context windows (and they hallucinate—a lot).

Generative AI System Design focuses on:

Prompt engineering and context shaping
Guardrails to constrain output
Cost/performance trade-offs (token usage matters)

Think: Chatbots, copilots, and content engines

Our Grokking Generative AI System Design course gets you ahead with this most in-demand AI skill:

Learn real-world architectures (e.g., text, image, speech, and video generation).
Master scaling strategies for distributed training and inference.
Keep models accurate and efficient with data and pipeline design.

3. Agentic systems

Agentic systems can reason, plan, call tools, and take multi-step actions. They’re powerful, but unpredictable.

Agentic System Design involves:

Tool use orchestration (functions, APIs)
State management and memory
Safety layers to prevent runaway behavior

Think: Autonomous agents, workflow planners, and complex RAG apps

Curious to learn more? Check out our course on Agentic System Design.

Which AI System Design skill will you need?

As a Staff+ engineer, you don’t need to be an expert in every kind of AI system—but you do need fluency across all three.

Here’s how to think about it:

Start with breadth:
- You’ll be asked to review designs, mentor teams, or make architectural calls across all types—so you must understand how each system works, where it fails, and how to reason about trade-offs.
Go deep depending on your org:
- If your team owns a content engine or internal assistant, you’ll need deep generative AI chops.
- If you're in a search, ranking, or recommendations org, ML system design will be core.
- If you're supporting ambitious LLM-driven tooling, you’ll need to understand agent orchestration.
Leadership means pattern matching:
- Even if you’re not building an agentic system today, recognizing when a design is one helps you avoid dangerous oversimplifications (It’s just a chatbot—until it starts making real decisions).

Ask

John 2.0

Legacy Code Whisperer

System Design Like a Staff+

AI Engineering

Reliability Under Fire

Data Engineering for Product Impact

Security Without Drama

Product Sense

Multiplier Habits and Systems

Bonus: Comp Strategy and Promotion Packets

Congratulations, John 2.0

AI System Design

Key differences: AI vs. traditional systems

1. Behavior: Deterministic vs. probabilistic

2. Feedback: Deploy once vs. continuous evaluation

3. Data: Static inputs vs. embedded knowledge

4. Control: Input validation vs. output guardrails

Example: Designing an AI-powered support assistant

Defining SLOs

Architectural design decisions

Business tie-in

Types of AI System Design

1. Machine learning systems

2. Generative AI systems

3. Agentic systems

Which AI System Design skill will you need?