Multi-Query Techniques for Complex Information Retrieval
Understand how to use multi-query techniques in retrieval-augmented generation to improve the accuracy of complex information retrieval. Learn to generate multiple query variations, perform parallel searches, aggregate results, and implement these steps using LangChain and OpenAI API integrations.
We'll cover the following...
Multi-query is a technique used in advanced RAG models that improve the retrieval of relevant documents for complex user questions.
Imagine a user asks the following question: “What is LangSmith, and why do we need it?” A simple retrieval system might only find documents containing the exact phrase “LangSmith.” But what if the document discusses LangSmith using synonyms or related concepts? Here, multi-query helps by generating multiple variations of the original question, capturing different aspects of the user’s intent. This broadens the search and retrieves documents that might not contain the exact keywords but still hold valuable information.
What is multi-query?
Multi-query utilizes an LLM to automatically generate reformulations of the user’s original question. It aims to create multiple versions that capture different perspectives on the user’s intent, increasing the chances of finding relevant documents even when the wording differs slightly. Here’s how it works:
Single user input: It all starts with a single question the user poses.
Query diversification: The core concept of multi-query is to expand the search beyond the original query. This is achieved by using LLMs or other techniques to rephrase the question into various forms. Imagine asking the question differently to capture the full scope of what you’re looking for.
Multiple queries generation: The LLM generates several reformulated versions of the original query, each capturing a different aspect or perspective of the user’s intent.
Parallel search execution: These multiple reformulated queries are then used to perform parallel searches across the document collection.
Document retrieval: Each reformulated query retrieves a set of documents that are relevant to that specific phrasing of the question.
Results aggregation: The retrieved documents from all the different queries are aggregated. This aggregation ensures a broader and more comprehensive set of documents that might contain the relevant information.
Enhanced relevance assessment: The aggregated documents are then evaluated for relevance, ensuring that the most pertinent information is identified from the diverse set of retrieved documents.
Foundation for further steps: The retrieved documents based on each query variation become the building blocks for subsequent steps in the RAG process.
Step-by-step implementation
The following are the steps to implement multi-query:
1. Import necessary modules
We’ll import the required modules from the installed libraries to implement multi-query:
These libraries and modules are essential for the subsequent steps in the process.
2. Set up the LangSmith and OpenAI API keys
The following code snippet sets up your LangChain API key and OpenAI API key from environment variables. We’ll need valid API keys to interact with the LangChain and OpenAI language models:
Code explanation
Lines 1–4: Sets up the LangChain environment variables:
LANGCHAIN_TRACING_V2: Enables tracing for LangChain operations.LANGCHAIN_ENDPOINT: Specifies the endpoint for the LangChain API.LANGCHAIN_API_KEY: An empty string placeholder for the LangSmith LangChain API key. Replace it with your actual key.LANGCHAIN_PROJECT: Sets the project name for LangChain operations to'Multi-Query'.
Lines 6–8: Sets up the OpenAI API key:
OPENAI_API_KEY: An empty string placeholder for the OpenAI API key. Replace it with your actual key.Validation: Checks if the
OPENAI_API_KEYis empty and raises aValueErrorif it is, ensure a valid API key is provided for authenticating OpenAI API requests.
3. Prepare data and split text
Now, let’s load the text documents you want to use for retrieval and split them:
Code explanation
Lines 1–3: Loaders are defined to read text files using
TextLoader, specifying the file paths of the documents to be loaded.Lines 5–7: An empty list
docsis created, and a loop iterates over the loaders, loading the content of each document and extending thedocslist with the loaded content.Lines 9–10: A
RecursiveCharacterTextSplitteris initialized with a chunk size of 400 characters and an overlap of 60 characters between chunks. The splitter then processes thedocslist, splitting each document into smaller chunks suitable for processing by LLMs.
4. Index documents
After splitting the text, we create a vector store to efficiently store and retrieve document chunks. Additionally, we generate embeddings for each chunk to capture its semantic meaning:
Code explanation
Line 2: We use
Chromato create the vector store (vectorstore) with our prepared text chunks (splits) and generate embeddings usingOpenAIEmbeddingsto capture semantic relationships between words in the text snippets.Line 4: Finally, we convert the vector store into a retriever using
as_retriever(), enabling the retrieval of documents based on a query embedding.
5. Generate multi-perspective query with LLM
Now that we have prepared and indexed our data, we can focus on the core functionality of multi-query. Here, we will use an LLM to generate multiple variations of the user’s original question:
Code explanation
Lines 1–5: We define a prompt template instructing the AI to generate three variations of the user’s question, capturing different aspects to help the search engine retrieve relevant documents. The AI should provide these alternative questions on separate lines.
Line 7: We create the
prompt_perspectivesobject usingChatPromptTemplate.from_template(template). This sets up the template for the AI to follow when generating the query variations.Lines 9–14: We define the
generate_querieschain, which processes the prompt and generates the reformulated queries:prompt_perspectives: This is the output from the previous line, containing information about the user’s question.| ChatOpenAI(temperature=0): This part uses a function calledChatOpenAI.ChatOpenAIinteracts with the OpenAI API to generate creative text formats andtemperature=0sets a parameter to control the randomness of the generated text (0 means minimal randomness).| StrOutputParser(): TheStrOutputParserconverts the model’s output into a text string.| (lambda x: x.split("\n")): Finally, a lambda function splits this string by newlines (\n), resulting in a list of the original question and the three generated query variations.
6. Retrieve documents using multi-query
We will now utilize the generated multi-perspective queries to retrieve relevant documents from the indexed corpus:
Code explanation
Lines 1–19: The
get_unique_unionfunction ensures the retrieval of unique documents from the search results.Flattening the structure: The function starts by flattening the list of lists containing the retrieved documents. This transforms it into a single list where each document is an individual element.
Uniqueness check: It checks if the documents have a unique identifier attribute, like
'id'. If such an attribute exists, it extracts those unique IDs and uses a set to eliminate duplicates.String conversion (alternative): If no unique identifier attribute is found, the function converts each document object into a string representation. This allows for set-based operations to identify and remove duplicates.
Return unique documents: Finally, the function returns a list containing only the unique documents.
Line 17: We define a sample user question,
"What is LangSmith, and why do we need it?".Line 18: The
retrieval_chaincombines the process of generating multiple queries (generate_queries) with document retrieval functionality (retriever.map). The output of the retrieval chain is passed throughget_unique_unionto ensure only unique documents are considered in the final results.Lines 19–20: The chain is invoked with the user question as input, triggering the retrieval process and storing the retrieved documents in the
docsvariable. The length ofdocsindicates the number of unique documents retrieved using the multi-perspective queries.
7. Run the RAG model
The final chain is invoked with the user question, "What is LangSmith, and why do we need it?". This retrieves relevant documents and uses them to generate an answer through the RAG model:
Code explanation
Lines 1–7: The template defines how the AI should structure its response:
Context and question: It asks the AI to answer the question based on the provided context.
Template initialization: The
ChatPromptTemplate.from_templatemethod initializes the prompt using this template.
Line 9: The
ChatOpenAImodel is initialized with a temperature setting of, indicating deterministic responses. Lines 11–17: The
final_rag_chainorchestrates the retrieval and generation process:Context and question extraction: It extracts the context using the
retrieval_chainand the question using theitemgetter.Prompt application: The extracted context and question are formatted into the prompt.
LLM interaction: The formatted prompt is passed to the
ChatOpenAImodel to generate a response.Parsing response: The response from the LLM is parsed into a string format using
StrOutputParser.
Line 19: The
final_rag_chainis invoked with the user question,"What is LangSmith, and why do we need it?". This triggers the retrieval of relevant documents and uses them to generate an answer through the RAG model.
LangSmith
LangSmith is a powerful tool for exploring language models. We’ll use it to visualize and understand the inner workings of our queries.
We’ll understand how the language model processes and responds to our prompts by examining the sub-queries, inputs, and outputs:
Try it yourself
You can practice executing this code yourself in the Jupyter Notebook below: