From LLMs to AI Agents
Learn how LLMs evolve into AI agents with real autonomy.
Imagine you have an extremely knowledgeable assistant who has read millions of books but is locked in a library with no phone, no internet, and only a notepad with one page. You ask this assistant a complex question about yesterday’s stock prices or to schedule a meeting on your calendar. The assistant wants to help, but it has two big problems:
It can only draw on its memory (which doesn’t include yesterday’s news or your schedule).
Its notepad is so small that it forgets anything that doesn’t fit on that page.
Limitations of LLMs
Large language models (LLMs) are powerful at generating and understanding text. However, when an LLM is used in isolation, it faces several inherent limitations:
Limited knowledge beyond training: LLMs only know what’s in their training data, with no awareness of recent events or new facts. They can’t access real-time information so that answers may be outdated or incorrect.
Pre-agent fix: Developers would periodically retrain or fine-tune models with newer data or manually inject facts into prompts to update knowledge.
Why not enough: Retraining is slow and costly; manual updates don’t scale. LLMs still can’t respond to new or user-specific info without extra help.
No built-in access to external data: By default, LLMs can’t browse the web or interact with other apps, as they can only generate text and not perform real-world actions or fetch live data.
Pre-agent fix: Developers created custom scripts or plugins to fetch data or perform actions alongside the LLM.
Why not enough: Each integration was bespoke, brittle, hard to scale, and required ongoing maintenance.
Context window limitations: LLMs can only “remember” a limited chunk of text at a time. Anything outside this window is forgotten, so long conversations or big documents may lose earlier details.
Pre-agent fix: Retrieval-augmented generation (RAG) fetches and injects relevant data for each prompt.
Why not enough: RAG helps with facts but doesn’t give the LLM memory across conversations or enable planning and automation.
Potential for hallucination: LLMs may confidently invent facts when unsure, a phenomenon called hallucination. Without real-time grounding, their answers may be made up or unreliable.
Pre-agent fix: To reduce hallucinations, developers added manual review steps (humans in the loop) or carefully engineered prompts, and used RAG to ground answers in retrieved data.
Why not enough: LLMs still hallucinate if data is missing or ambiguous, as there’s no built-in way to verify or cross-check facts.
These limitations mean LLMs, on their own, can’t access real-time data or perform actions, since they’re mostly limited to tasks like writing help or answering general questions. Early LLMs couldn’t handle queries needing up-to-date or external information, so they stayed confined to self-contained tasks.
How do external tools and context empower LLMs?
Let’s delve deeper into why giving LLMs access to external tools and fresh context is a game changer, through a few relatable examples and analogies:
The memory-augmented assistant
Picture an AI assistant helping a doctor diagnose a patient. On its own, the LLM knows general medical knowledge but doesn’t remember the patient’s history or latest lab results (those details aren’t in its training data). If we provide context, such as the patient’s symptoms and medical records retrieved from a database, the LLM can provide far more accurate and relevant advice. Supplying context is like giving the assistant a patient’s file folder to read before answering. The assistant goes from guessing based only on textbook knowledge to using specific, up-to-date information. The result is a more intelligent outcome because the AI’s reasoning is now grounded in the relevant data.
The toolbox for problem-solving
Consider a coding assistant AI that a software developer uses. Without external tools, the LLM can suggest code snippets and explain concepts (drawing from its trained knowledge). That’s useful, but limited. Imagine we equip this AI with tools: it can run code in a sandbox, access documentation, and search the company’s code repository. Suddenly, the assistant can do so much more! It could not only suggest code, but also execute it to catch errors, run tests, or look up how a particular API works in the official docs. By calling external APIs or local commands, the LLM can verify its answers and produce working solutions (instead of just plausible ones). This is like giving a human programmer an interactive development environment and Google access, rather than just a blank sheet of paper, they’ll perform better with the tools.
Dynamic decision-making
A productivity bot powered only by an LLM gives generic advice. But if it can access your calendar, read emails, and take actions, like sending invites or setting reminders, it transforms into a real assistant. Now, it understands your context and performs tasks, acting more like a true executive assistant (with your permission).
When an LLM has access to relevant context and the ability to use tools, it can tackle problems that are otherwise impossible for a standalone model. The model’s built-in knowledge and reasoning serve as the “brain,” but now the brain has eyes, ears, and hands to interact with the world:
Fetching information: The LLM retrieves up-to-date data (like reading documents or searching online), improving accuracy and reducing hallucinations.
Taking actions: It can perform digital tasks—creating files, sending messages, or automating workflows.
Maintaining memory: The LLM remembers important details across sessions with long-term storage.
In a real sense, giving context and tools to LLMs is about making them more intelligent and useful. Intelligence isn’t just raw knowledge; it’s also the ability to gather new information, use the right tool for a task, and apply knowledge to the situation. We transform LLMs from brilliant-but-limited savants into flexible problem-solvers by extending them this way.
From LLMs to agents
Despite these creative workarounds, none delivered true autonomy.
RAG improved memory, but didn’t enable LLMs to act or plan.
Scripts/plug-ins could let LLMs “do” things, but every new tool or workflow meant new glue code and brittle integrations.
No approach lets LLMs make decisions, orchestrate multi-step tasks, or integrate seamlessly with a growing set of tools and resources.
What was missing? The leap from “just remembering” to “acting and orchestrating.”
This is the transition from an LLM to an agentic system, basically an LLM empowered with tools, context, and autonomy.
What makes a system agentic?
Agentic systems provide an LLM with a set of external tools and access to dynamic context, empowering the model to determine when and how to use those tools as it works to solve a task. In other words, the LLM is no longer just a passive generator of text; it is augmented with the capacity to:
Retrieve information (for example, searching the web or querying a database)
Invoke services or APIs (such as calling a weather service, sending an email, or running a calculation)
Maintain longer context or memory (by storing and recalling conversation history or relevant documents as needed)
Crucially, the agent decides which tool to use and when, based on the problem it’s trying to solve. This adaptability, the ability to choose and sequence actions appropriately, is central to agentic systems. Think back to our librarian: an agentic assistant is like one who, realizing a gap in their knowledge, says, “Let me check the latest report,” then uses a tool to fetch that report before answering.
Early experiments in this direction (like Auto-GPT and BabyAGI in 2023) showcased how an LLM could loop through planning steps, call tools such as web browsers or Python scripts, and iteratively refine its approach to achieve a user-defined goal. These projects were rough prototypes, but they demonstrated the tantalizing potential of AI that isn’t just static, but interactive and adaptive.
Around the same time, mainstream AI assistants began integrating tool use at scale:
ChatGPT introduced plugins and web browsing, allowing it to fetch up-to-date info and interact with third-party services.
Voice assistants started embedding LLMs to better understand and execute requests via APIs.
The industry began moving from simple “chatbots” to true AI agents.
Limitations of agents
Giving an AI agent tools and context wasn’t a fix-all solution. Each new integration, whether connecting an LLM to a database, a cloud service, or a file system, still required custom code, careful prompt engineering, and special handling for security. For every new “ability,” you often had to build a new integration from scratch.
This approach doesn’t scale:
Developers found it difficult to scale truly connected systems when every data source needed a custom integration.
The proliferation of bespoke tools led to fragile systems, where one tool’s output might not be formatted as the LLM expects, causing errors or misinterpretations.
The field needed a universal, open, secure protocol to connect any tool, data source, or resource and allow agentic systems to truly scale.
This is exactly why the Model Context Protocol (MCP) was created. MCP is the open standard designed to connect AI agents to the world of tools and data, securely, flexibly, and at scale.
A common question at this point is mcp vs agentic ai. The two aren't competing ideas. Agentic AI defines what the system can do, while MCP defines how it connects to the tools and data that make those capabilities possible.
Case study: The medical AI assistant
Dr. Lee uses an AI assistant to help with patient diagnosis.
Initial version: The assistant is powered only by a standalone LLM. Dr. Lee asks about patient symptoms and receives advice based on general medical knowledge. But when Dr. Lee mentions recent lab results, the assistant cannot access or recall them. It can only give general information, sometimes making plausible but inaccurate guesses.
Agentic upgrade: The AI is enhanced to access patient records and current lab databases. When Dr. Lee asks for a diagnosis, the agent retrieves the patient’s latest test results, considers recent medical history, and cross-references current treatment guidelines. It provides a diagnosis grounded in up-to-date, patient-specific data and can even schedule follow-up appointments by interfacing with the clinic’s calendar system.
Questions
What specific problems did the agentic upgrade solve that the standalone LLM could not?
What risks remain if the agentic system’s integrations are brittle or insecure?