Large Language Models
Learn about language models, large language models (LLMs), how they work, and their multimodal capabilities.
Let’s imagine a conversation with a friend, where the friend starts a sentence with “I’m going to make a cup of ________.” Humans would likely predict that the next word could be “coffee” or “tea” based on their knowledge of common beverage choices.
Similarly, a language model is trained to understand and predict the next word in a sequence based on the context of the preceding words. It learns from vast amounts of text data and can make informed predictions about which word is likely to come next in a given context.
What is a large language model?
The best way to understand the term “Large Language Model” is to break it down.
Large: This refers to two things: the sheer number of internal parameters the model has (often in the hundreds of billions) and the massive amount of text data it was trained on (a significant snapshot of the public internet).
Analogy: You can think of parameters as the billions of tiny knobs and dials that the model tuned during its training. Each knob helps capture a minuscule pattern in the relationship between words, grammar, and concepts.
Language: This specifies the model’s domain. It’s designed to understand, process, and generate human language in all its forms, from prose and poetry to structured text like
JSONand programming languages like Python.Model: This is the main word for us. An LLM is not a database or a search engine; it is a mathematical representation of language.
Analogy: Just as a weather model uses complex equations to represent and predict atmospheric conditions, a language model uses its vast number of parameters to represent and predict language.
A text completion engine
Now that we know what the name means, what does an LLM actually do?
At its absolute core, an LLM has one primary job: to predict the next most likely word (or, more accurately, token) in a sequence. That’s it. It might seem too simple, but this is the foundational task.
You can think of it as the most sophisticated autocomplete you’ve ever seen. When you type “The cat sat on the…” into your phone, it might suggest “mat,” “floor,” or “couch.” An LLM does the same thing, but with an incredibly deep, nuanced understanding of grammar, context, style, and facts derived from its training.
This simple objective, when performed at a massive scale, gives rise to what are called emergent abilities. Skills like answering questions, summarizing text, translating languages, and writing code aren’t explicitly programmed into the model. They emerge naturally from the core task of mastering next-word prediction over a vast dataset.
LLMs are probabilistic, not all-knowing
This is the single most important concept to grasp when working with LLMs. It is the key to using them effectively and responsibly.
An LLM is a probabilistic model, not a deterministic database. It generates what is statistically likely to come next, not what is factually true.
Think of an LLM less like a perfect search engine retrieving a fact and more like a creative partner improvising the next line of a story. It draws on all the patterns it learned during training to make a highly educated guess about the most plausible continuation, but it is still a guess.
This probabilistic nature is a double-edged sword. It’s why LLMs can be so creative and flexible, but it’s also the root cause of hallucinations, when the model generates confident-sounding but incorrect or nonsensical information. The model isn’t “lying”; it’s simply generating a statistically probable sequence of words that happens to be factually wrong. Understanding this is critical to building reliable applications.
Putting theory into practice: Two types of prompts
Let’s see how this probabilistic nature plays out with two simple examples.
Example 1: A knowledge-based prediction
Prompt: “The planet closest to the sun is”
Likely LLM Output: “Mercury.”
Reasoning: In the vast amount of text the model was trained on, the word “Mercury” has the highest probability of following that specific sequence. This feels factual, but it’s still a probabilistic calculation.
Example 2: A generative prediction
Prompt: “Once upon a time, in a forest full of talking animals,”
Likely LLM Output: “There lived a clever fox named Finn.”
Reasoning: Here, there is no single “correct” answer. The LLM generates a creative and plausible continuation based on the patterns of countless stories it has encountered before. “Finn” isn’t a fact; it’s just a statistically likely name for a fox in this context.
Feel free to try both text completion examples in the widget below.
We defined a large language model as a massive, probabilistic model that predicts the next word in a sequence. However, that definition, while accurate, remains abstract. To truly understand a complex system, we need to see it in action. How can we trace a single piece of information through the entire process to make it concrete and understandable?
Introducing our guide: The prompt
To anchor our learning, we need a simple, consistent example. We will use the following prompt throughout our technical exploration:
The obvious next word (if your memory is serving you well) is, of course, “star”. This specific prompt is an excellent guide for our journey:
It’s simple and direct: The prompt is short and easy to grasp, allowing us to focus on the mechanics without getting lost in complex sentence structure.
It’s a pure continuation: It perfectly represents the most fundamental task of an LLM: given a sequence of text, predict what comes next.
It’s predictable: We know the “correct” next word is “star”. This allows us to focus on the process of how the model arrives at the answer, rather than debating the answer itself.
The two journeys of a prompt
To fully understand our prompt, we need to ask two fundamental questions. These two questions define the two journeys we will take to understand LLMs.
Journey 1: The inference journey (how it thinks)
First, we ask: Given the prompt, “Twinkle, twinkle, little”, how does a trained model generate the word “star”?
Inference is the process of using a pretrained model to generate a response to a new prompt. It’s the “live” or “in-production” phase of an LLM’s life.
Analogy: This is like a chef following a recipe they have already perfected. They are actively using their existing knowledge and skills to produce a result right now.
In this journey, we will act like detectives. We’ll follow our prompt step-by-step through the model’s internal architecture to see exactly how it generates the word “star”. This is where we answer the “how it thinks” questions:
How does the prompt turn into numbers that the model can understand?
How does the model figure out which words are most important?
How does it make its final prediction?
Journey 2: The training journey (how it learns)
Next, we ask the bigger question: How did the model learn that “star” was the right word to predict in the first place?
Training: It is the process of creating the model itself. It’s how the model acquired all of its knowledge and capabilities before we ever sent it our first prompt.
Analogy: This is like the chef spending years in culinary school and working in kitchens. It’s the entire process of learning, practicing, and refining their skills until the recipes become second nature.
In this journey, we’ll take a conceptual look at how the model was built. We’ll explore the massive undertaking of pre-training and the crucial steps of alignment that make a model helpful and safe. We will answer the “how it learns” questions:
How does a model learn from trillions of words of raw text?
How do we “align” the model to be helpful and follow instructions?
Our journey begins
We’ve demystified the term “large language model” and now have a solid mental framework: an LLM is a massive, probabilistic model trained to do one thing exceptionally well—predict the next word. We know this simple capability unlocks complex and incredible behaviors.
But this high-level “what” isn’t enough for us. We need to know how. What actually happens inside the machine when we send a prompt? How does it turn our words into a prediction?
We now have a clear map for our entire learning expedition. We have our guide—our specific prompt—and we know the two paths we’ll take to understand it: the inference journey and the training journey. This roadmap will keep us oriented as we dive into more technical concepts.
In our next lesson, we’ll take our first step on that journey. We will introduce the exact prompt that will be our guide for the rest of the course, uncovering how an LLM truly “thinks” one step at a time.