Multi-query is a technique used in advanced RAG models that improve the retrieval of relevant documents for complex user questions.

Imagine a user asks the following question: “What is LangSmith, and why do we need it?” A simple retrieval system might only find documents containing the exact phrase “LangSmith.” But what if the document discusses LangSmith using synonyms or related concepts? Here, multi-query helps by generating multiple variations of the original question, capturing different aspects of the user’s intent. This broadens the search and retrieves documents that might not contain the exact keywords but still hold valuable information.

What is multi-query?

Multi-query utilizes an LLM to automatically generate reformulations of the user’s original question. It aims to create multiple versions that capture different perspectives on the user’s intent, increasing the chances of finding relevant documents even when the wording differs slightly. Here’s how it works:

Single user input: It all starts with a single question the user poses.
Query diversification: The core concept of multi-query is to expand the search beyond the original query. This is achieved by using LLMs or other techniques to rephrase the question into various forms. Imagine asking the question differently to capture the full scope of what you’re looking for.
Multiple queries generation: The LLM generates several reformulated versions of the original query, each capturing a different aspect or perspective of the user’s intent.
Parallel search execution: These multiple reformulated queries are then used to perform parallel searches across the document collection.
Document retrieval: Each reformulated query retrieves a set of documents that are relevant to that specific phrasing of the question.
Results aggregation: The retrieved documents from all the different queries are aggregated. This aggregation ensures a broader and more comprehensive set of documents that might contain the relevant information.
Enhanced relevance assessment: The aggregated documents are then evaluated for relevance, ensuring that the most pertinent information is identified from the diverse set of retrieved documents.
Foundation for further steps: The retrieved documents based on each query variation become the building blocks for subsequent steps in the RAG process.

Python 3.10.4

import os
import bs4
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough

Code explanation

Lines 1–4: Sets up the LangChain environment variables:
- LANGCHAIN_TRACING_V2: Enables tracing for LangChain operations.
- LANGCHAIN_ENDPOINT: Specifies the endpoint for the LangChain API.
- LANGCHAIN_API_KEY: An empty string placeholder for the LangSmith LangChain API key. Replace it with your actual key.
- LANGCHAIN_PROJECT: Sets the project name for LangChain operations to 'Multi-Query'.
Lines 6–8: Sets up the OpenAI API key:
- OPENAI_API_KEY: An empty string placeholder for the OpenAI API key. Replace it with your actual key.
- Validation: Checks if the OPENAI_API_KEY is empty and raises a ValueError if it is, ensure a valid API key is provided for authenticating OpenAI API requests.

3. Prepare data and split text

Now, let’s load the text documents you want to use for retrieval and split them:

Code explanation

Lines 1–3: Loaders are defined to read text files using TextLoader, specifying the file paths of the documents to be loaded.
Lines 5–7: An empty list docs is created, and a loop iterates over the loaders, loading the content of each document and extending the docs list with the loaded content.
Lines 9–10: A RecursiveCharacterTextSplitter is initialized with a chunk size of 400 characters and an overlap of 60 characters between chunks. The splitter then processes the docs list, splitting each document into smaller chunks suitable for processing by LLMs.

4. Index documents

After splitting the text, we create a vector store to efficiently store and retrieve document chunks. Additionally, we generate embeddings for each chunk to capture its semantic meaning:

Code explanation

Line 2: We use Chroma to create the vector store (vectorstore) with our prepared text chunks (splits) and generate embeddings using OpenAIEmbeddings to capture semantic relationships between words in the text snippets.
Line 4: Finally, we convert the vector store into a retriever using as_retriever(), enabling the retrieval of documents based on a query embedding.

5. Generate multi-perspective query with LLM

Now that we have prepared and indexed our data, we can focus on the core functionality of multi-query. Here, we will use an LLM to generate multiple variations of the user’s original question:

Python 3.10.4

template = """You are an AI language model assistant tasked with generating informative queries for a vector search engine.
The user has a question: "{question}"
Your goal is to create three variations of this question that capture different aspects of the user's intent. These variations will help the search engine retrieve relevant documents even if they don't use the exact keywords as the original question.
Provide these alternative questions, each on a new line.**
Original question: {question}"""
prompt_perspectives = ChatPromptTemplate.from_template(template)
generate_queries = (
    prompt_perspectives 
    | ChatOpenAI(temperature=0) 
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

Code explanation

Lines 1–5: We define a prompt template instructing the AI to generate three variations of the user’s question, capturing different aspects to help the search engine retrieve relevant documents. The AI should provide these alternative questions on separate lines.
Line 7: We create the prompt_perspectives object using ChatPromptTemplate.from_template(template). This sets up the template for the AI to follow when generating the query variations.
Lines 9–14: We define the generate_queries chain, which processes the prompt and generates the reformulated queries:
- prompt_perspectives: This is the output from the previous line, containing information about the user’s question.
- | ChatOpenAI(temperature=0): This part uses a function called ChatOpenAI. ChatOpenAI interacts with the OpenAI API to generate creative text formats and temperature=0 sets a parameter to control the randomness of the generated text (0 means minimal randomness).
- | StrOutputParser(): The StrOutputParser converts the model’s output into a text string.
- | (lambda x: x.split("\n")): Finally, a lambda function splits this string by newlines (\n), resulting in a list of the original question and the three generated query variations.

6. Retrieve documents using multi-query

We will now utilize the generated multi-perspective queries to retrieve relevant documents from the indexed corpus:

Python 3.10.4

def get_unique_union(documents: list[list]):
  """ Unique union of retrieved docs """
  # Flatten list of lists
  flattened_docs = [doc for sublist in documents for doc in sublist]
  # Option 1: Check library documentation for hashable attribute (e.g., 'id')
  if hasattr(flattened_docs[0], 'id'):  # Replace 'id' with the appropriate attribute
      unique_docs = list(set(doc.id for doc in flattened_docs))
  # Option 2: Convert to string (if suitable)
  else:
      unique_docs = list(set(str(doc) for doc in flattened_docs))
  return unique_docs
# Retrieve
question = "What is LangSmith, and why do we need it?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question":question})
len(docs)

Code explanation

Lines 1–19: The get_unique_union function ensures the retrieval of unique documents from the search results.
- Flattening the structure: The function starts by flattening the list of lists containing the retrieved documents. This transforms it into a single list where each document is an individual element.
- Uniqueness check: It checks if the documents have a unique identifier attribute, like 'id'. If such an attribute exists, it extracts those unique IDs and uses a set to eliminate duplicates.
- String conversion (alternative): If no unique identifier attribute is found, the function converts each document object into a string representation. This allows for set-based operations to identify and remove duplicates.
- Return unique documents: Finally, the function returns a list containing only the unique documents.
Line 17: We define a sample user question, "What is LangSmith, and why do we need it?".
Line 18: The retrieval_chain combines the process of generating multiple queries (generate_queries) with document retrieval functionality (retriever.map). The output of the retrieval chain is passed through get_unique_union to ensure only unique documents are considered in the final results.
Lines 19–20: The chain is invoked with the user question as input, triggering the retrieval process and storing the retrieved documents in the docs variable. The length of docs indicates the number of unique documents retrieved using the multi-perspective queries.

7. Run the RAG model

The final chain is invoked with the user question, "What is LangSmith, and why do we need it?". This retrieves relevant documents and uses them to generate an answer through the RAG model:

Code explanation

Lines 1–7: The template defines how the AI should structure its response:
- Context and question: It asks the AI to answer the question based on the provided context.
- Template initialization: The ChatPromptTemplate.from_template method initializes the prompt using this template.
Line 9: The ChatOpenAI model is initialized with a temperature setting of $0$ , indicating deterministic responses.
Lines 11–17: The final_rag_chain orchestrates the retrieval and generation process:
- Context and question extraction: It extracts the context using the retrieval_chain and the question using the itemgetter.
- Prompt application: The extracted context and question are formatted into the prompt.
- LLM interaction: The formatted prompt is passed to the ChatOpenAI model to generate a response.
- Parsing response: The response from the LLM is parsed into a string format using StrOutputParser.
Line 19: The final_rag_chain is invoked with the user question, "What is LangSmith, and why do we need it?". This triggers the retrieval of relevant documents and uses them to generate an answer through the RAG model.

1.Getting Started

2.Introduction to Retrieval-Augmented Generation (RAG)

3.Advanced RAG: Pre-Retrieval (Optimizing Indexing)

4.Advanced RAG: Pre-Retrieval (Optimizing Query)

5.Advanced RAG: Post-Retrieval Process

Mini Project

6.Conclusion

Multi-Query Techniques for Complex Information Retrieval

What is multi-query?

Step-by-step implementation

1. Import necessary modules

2. Set up the LangSmith and OpenAI API keys

3. Prepare data and split text

4. Index documents

5. Generate multi-perspective query with LLM

6. Retrieve documents using multi-query

7. Run the RAG model

LangSmith

Try it yourself