Designing Our Research Assistant
Explore the design phase of building an intelligent Research Assistant that automates a manual research workflow. Understand how to architect a monolithic AI agent using Google ADK. Learn to plan the agent's instruction prompt and equip it with tools like Wikipedia, arXiv, and Google Search to autonomously gather and synthesize information into a single report.
In our journey so far, we have successfully established a working development environment and have seen how to create and run a basic AI agent. We are now ready to move from simple demonstrations to designing and building a practical, multi-step agent that can solve a tangible, real-world problem.
This process mirrors professional software engineering best practices. Before writing a single line of implementation code, it is crucial to first establish a clear plan. This involves understanding the problem we aim to solve, defining what a successful solution looks like, and creating a robust architectural blueprint for the system we intend to build. This lesson is dedicated to that design and planning phase. We will architect our “Research Assistant” from the ground up, ensuring we have a complete and coherent plan before implementation begins.
The problem statement: The manual research workflow
To appreciate the value of the agent we are about to build, let’s first consider a common and relatable scenario. Imagine a student, analyst, or researcher tasked with getting up to speed on a complex and rapidly evolving topic, such as “CRISPR gene editing.” The goal is to produce a concise summary that includes a general overview, recent academic findings, and supplementary web resources.
The manual workflow to accomplish this is often fragmented, inefficient, and requires a great deal of context switching. It typically looks something like this:
Initial overview: The researcher opens a web browser and navigates to Wikipedia. They search for “CRISPR” to gain a high-level, basic understanding of the topic, what it is, its history, and its key applications. They read through the article, identifying the most important concepts.
Academic literature review: Next, they open a new browser tab and navigate to an academic repository like arXiv.org or Google Scholar. They run several searches for recent papers related to CRISPR, looking for breakthroughs, new techniques, or significant findings from the last year. They may skim the abstracts of several papers to find the most relevant information.
Supplementary research: Finally, to round out their understanding, they open yet another browser tab and use a standard search engine like Google. They search for terms like “latest CRISPR news” or “ethical implications of gene editing” to find supplementary articles, blog posts, and news reports that provide a broader context.
Information consolidation: As they gather information from these sources, the researcher must continuously switch back and forth between browser tabs and a local text editor. They copy and paste key sentences, summaries, and links, trying to manually synthesize the information into a coherent set of notes.
This entire process is fundamentally manual. It relies on the researcher to act as the orchestrator, juggling multiple sources, filtering information, and piecing it all together. The core problem is the inefficiency and cognitive load of this multi-step, copy-and-paste workflow.
The solution: An automated research assistant
Our solution is to build an intelligent, automated “Research Assistant” that will replace this entire manual workflow with a single, seamless interaction. Instead of performing a dozen manual steps, the user will simply provide a single directive to our agent, such as: "Research the topic of CRISPR gene editing."
From that point on, the agent will take over, orchestrating a complete research process autonomously. The agent’s automated plan will be designed to intelligently mimic the steps of a human researcher, but with the speed and efficiency of a machine. It will be responsible for:
Automatically querying Wikipedia to retrieve a high-level summary.
Automatically searching the arXiv database to find relevant and recent academic papers.
Automatically performing a general web search to gather supplementary, real-time context on the topic.
Automatically synthesizing all the collected information into a single, structured report file.
By automating this sequence, our agent will solve the core problem of inefficiency and context switching, delivering a comprehensive research output in a fraction of the time it would take a human to do so manually.
Architectural design: The monolithic agent
To implement this solution, we need a clear architectural blueprint. For this first iteration of our project, we will employ a monolithic agent design.
A monolithic agent is an architectural pattern where a single, central LlmAgent contains all the logic and is responsible for orchestrating all the necessary tools to complete a task.
This is a powerful and effective starting point, as it consolidates the entire workflow’s logic in one place, making it easier to reason about and implement.
Our monolithic agent’s architecture will consist of two primary components: its “brain,” which is the instruction prompt, and its capabilities, which are the tools it has at its disposal.
The agent’s brain
The core of our agent’s intelligence and planning ability will come from its instruction parameter. For a complex, multi-step task like this, the instruction prompt is not just a simple directive; it is the agent’s master plan. We will craft a detailed, comprehensive prompt that explicitly tells the LLM the exact sequence of steps it must follow to complete the research workflow successfully.
This prompt will instruct the agent to:
Begin by researching the topic. It should use the
wikipedia_toolfor general information, thearxiv_toolfor academic papers, and theGoogleSearchTooltool for supplementary web-based context.Synthesize the findings. After gathering information from all three sources, it must synthesize the content into a single, coherent report.
Save the result. Finally, it must use the
report_writer_toolto save the complete report to a file.
By encoding the entire workflow into the instruction prompt, we are using the LLM’s powerful reasoning and sequencing capabilities to drive the agent’s behavior from start to finish.
The agent’s tools
To execute its master plan, the agent needs to be equipped with the right capabilities. We will provide these capabilities as a list of tools.
wikipedia_tool: It is a custom Python function we will write that takes a search query as input. It will use a third-party Python library to interact with the Wikipedia library and return a concise summary of the relevant article.arxiv_tool: It is a custom Python function that will take a search query. It will use a Python wrapper for the public arXiv API to find the most recent and relevant academic papers, returning their titles and summaries.GoogleSearchTool: It is the built-in tool to find supplementary, real-time information from across the web. We will use this powerful, prebuilt tool provided by the ADK framework. This tool enables the agent to perform a Google search and receive a summary of the results.report_writer_tool: It is a simple utility function that will take a string of text (the agent’s research notes) and a file name as input. Its job is to write this content to a local file, simulating the creation of the final report.
Project setup and dependencies
The final part of our design phase involves preparing our project environment with all the necessary libraries and configurations that our tools will depend on.
Installing required libraries
Our two custom research tools rely on external Python libraries that act as convenient wrappers for the Wikipedia and arXiv APIs. We will need to install them using pip.
Wikipedia: This library provides a clean and simple interface for accessing and parsing data from Wikipedia.
ArXiv: This library is a Python wrapper that simplifies the process of searching for and retrieving paper metadata from the public arXiv API.
pip install wikipediapip install arxiv
Using the built-in Google search tool
To find supplementary sources, we will use the powerful, prebuilt GoogleSearchTool provided by the ADK framework. Unlike custom tools that require us to build our own API integrations, the GoogleSearchTool is designed to work out of the box. It leverages the underlying Gemini model’s native ability to search the web (a feature often called “grounding”). This simplification allows us to add a powerful, real-time web search capability to our agent simply by importing the tool and adding it to our agent’s tool list.
With all this setup in place, the blueprint for our Research Assistant is now complete. We have moved from a vague idea to a concrete and actionable plan by clearly defining the problem, designing a robust solution, and outlining the necessary technical architecture and dependencies. This methodical process of design and preparation is the foundational step in engineering any sophisticated system, whether AI or otherwise. With this solid plan in place, we have prepared for a successful implementation, ready to translate our architectural design into functional code.