Search⌘ K
AI Features

Hypothetical Document Embeddings (HyDE): Simulating Context

Explore how hypothetical document embeddings (HyDE) enhance pre-retrieval optimization in RAG systems by simulating relevant context. Learn to generate embeddings, query vector stores, and implement HyDE using LangChain with practical code examples.

Why hypothetical document embeddings (HyDE)?

Traditional document retrieval in RAG models relies on matching queries with existing documents in a collection. This approach faces limitations:

  • Limited generalizability: Existing retrieval methods often struggle with unseen domains or queries with subtle variations.

  • Factual accuracy: Retrieving documents based solely on keyword matching might lead to irrelevant or inaccurate information, especially for complex queries.

HyDE tackles these challenges by introducing the concept of hypothetical documents.

Educative Byte: Assume you are a student and preparing for a history test with lots of books to read. HyDE, like a smart study buddy, jumps in to lend a hand. It takes all that information and makes super helpful study notes just for you. These notes aren’t copies of the books, but they’re the most important bits you need to remember. For instance, if you’re studying World War II, HyDE might summarize the big reasons for the war, the major battles, and how it ended. HyDE’s summaries make studying much easier—you can understand the main ideas faster.

What is HyDE?

HyDE, as described in thisGao, Luyu, Xueguang Ma, Jimmy Lin, and Jamie Callan. "Precise zero-shot dense retrieval without relevance labels." arXiv preprint arXiv:2212.10496 (2022). paper by Luyu Gao, leverages LLMs to generate hypothetical document embeddings that represent ideal documents for answering a given query. These embeddings, even though not corresponding to actual documents, capture the essence of the information needed. This allows the retrieval process to focus on documents containing relevant content, leading to more accurate and informative responses.

An illustration of the HyDE model (source: Luyu Gao, Precise Zero-Shot Dense Retrieval without Relevance Labels)
An illustration of the HyDE model (source: Luyu Gao, Precise Zero-Shot Dense Retrieval without Relevance Labels)

How HyDE works

Here’s a breakdown of the HyDE workflow:

  • Query processing: The user submits a query.

  • Hypothetical document generation: HyDE utilizes an LLM to create one or more “hypothetical documents” that address the query. These documents might not be factual or complete, but they capture the information a relevant document would contain. This generation process often involves prompting the LLM with instructions like “Write a short summary of a web page that answers the question...”.

  • Embedding creation: Each generated hypothetical document is then converted into a numerical representation called an embedding. This embedding captures the semantic meaning of the document.

  • Document retrieval: The system searches for existing documents in the collection whose embeddings are most similar to the hypothetical document embeddings. This process leverages vector similarity techniques.

  • Response generation: The retrieved documents are fed into the RAG model’s generation stage, where they are used to create a response to the user’s query.

The high level workflow of HyDE
The high level workflow of HyDE

Step-by-step implementation

Now, let’s dive into the provided code and understand how it implements HyDE:

Steps for implementing HyDE
Steps for implementing HyDE

1. Import necessary modules

We’ll import the required modules from the installed libraries to implement the HyDE:

Python 3.10.4
import os
from langchain_openai import OpenAI
from langchain_openai import OpenAIEmbeddings
from langchain.chains import HypotheticalDocumentEmbedder
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

These libraries and modules are essential for the subsequent steps in the process.

2. Set up the OpenAI API key

Set the OPENAI_API_KEY environment variable with your key:

Python 3.10.4
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"] = "" # Add your OpenAI API key
if OPENAI_API_KEY == "":
raise ValueError("Please set the OPENAI_API_KEY environment variable")

Code explanation

  • Line 1: Set the OPENAI_API_KEY variable to an empty string and assign it to the environment variable OPENAI_API_KEY using os.environ. This is where you should add your OpenAI API key.

  • Lines 2–3: If the OPENAI_API_KEY is still an empty string after the assignment, raise a ValueError with the message "Please set the OPENAI_API_KEY environment variable". This ensures that the API key is properly set before continuing with the program execution.

3. Load and split documents

Here, we load some example documents and prepare them for processing by the LLM. Since real-world documents might be lengthy, we’ll also perform text splitting to ensure they fit the LLM’s input limitations.

Python 3.10.4
loaders = [
TextLoader("blog.langchain.dev_announcing-langsmith_.txt"),
TextLoader("blog.langchain.dev_automating-web-research_.txt"),
]
docs = []
for loader in loaders:
docs.extend(loader.load())
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=400, chunk_overlap=60)
splits = text_splitter.split_documents(docs)

Code explanation

  • Lines 1-4: Initialize a list called loaders, containing instances of the TextLoader class from LangChain. These loaders are used to load text files containing the documents to be processed.

  • Lines 6-8: Iterate over each loader in the loaders list and load the documents using the load() method of each loader. The documents loaded from each loader are then appended to the docs list.

  • Line 10: Create an instance of the RecursiveCharacterTextSplitter class, specifying a chunk_size of 400 characters. This splitter class is used to split large documents into smaller, more manageable chunks.

  • Line 11: Call the split_documents() method of the text_splitter object with the docs list as input. This method splits each document into the docs list into smaller chunks using the specified chunk_size. The resulting split documents are then assigned back to the docs list.

4. Create a vector store

A vector store serves as a critical component for retrieval in HyDE. It allows us to store document embeddings and efficiently search for documents similar to a hypothetical document embedding.

Python 3.10.4
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

Code explanation

  • Line 1: A vector store is created to facilitate information retrieval by indexing document embeddings.

5. Generate embeddings (single and multiple)

HyDE’s core functionality is generating embeddings representing hypothetical documents relevant to a user query. Here, we’ll explore generating both single and multiple embeddings.

Below is the implementation of single embedding generation.

Python 3.10.4
embeddings = HypotheticalDocumentEmbedder.from_llm(OpenAI(), OpenAIEmbeddings(), "web_search")
query = "What is LangSmith, and why do we need it?"
result = embeddings.embed_query(query)

Code explanation

  • Line 1: Initialize the embedding model and LLM. The HypotheticalDocumentEmbedder class combines the capabilities of an OpenAI language model (LLM) with OpenAIEmbeddings for creating embeddings, specifically for the "web_search" context.

  • Line 3: Define a query about LangSmith. This query string will be used to generate an embedding that represents the query in a numerical format.

  • Line 5: Use the embedding model to generate an embedding for the query. The embed_query method processes the query string, converting it into an embedding vector that captures the semantic meaning of the query.

Below is the implementation of multiple embedding generation.

Python 3.10.4
multi_llm = OpenAI(n=3, best_of=4)
embeddings = HypotheticalDocumentEmbedder.from_llm(multi_llm, OpenAIEmbeddings(), "web_search")
result = embeddings.embed_query("What is LangSmith, and why do we need it?")

Code explanation

  • Line 1: Initialize an OpenAI LLM with specific parameters. The n=3 parameter specifies generating three completions per prompt, and best_of=4 means choosing the best completion out of four attempts.

  • Line 3: Initialize the embedding model using the previously created LLM. The HypotheticalDocumentEmbedder class combines the capabilities of multi_llm with OpenAIEmbeddings for creating embeddings, specifically for the "web_search" context.

  • Line 5: Generate an embedding for a specific query. The embed_query method processes the query string "What is LangSmith, and why do we need it?", converting it into an embedding vector that captures the semantic meaning of the query.

6. Query the vector store for HyDE

Before delving into the HyDE technique, it’s essential to understand how to query the vector store to retrieve relevant information:

Python 3.10.4
query = "What is LangSmith, and why do we need it?"
vectorstore.similarity_search(query)

Code explanation

  • Line 1: Define the search query as a string. This specifies the information we’re looking for in the vector store.

  • Line 2: Call the similarity_search method on the vectorstore object. This method performs the actual search within the vector store.

7. Generate a hypothetical document

In this step, a hypothetical document is generated using a defined prompt template:

Python 3.10.4
system = """
As a knowledgeable and helpful research assistant, your task is to provide informative answers based on the given context.
Use your extensive knowledge base to offer clear, concise, and accurate responses to the user's inquiries.
Question: {question}
Answer:
"""
prompt = ChatPromptTemplate.from_messages(
[
("system", system),
("human", "{question}"),
]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
context = prompt | llm | StrOutputParser()
answer = context.invoke(
{
"What is LangSmith, and why do we need it?"
}
)
print(answer)

Code explanation

  • Lines 1–6: A system message is defined as a prompt template to generate informative responses based on the context. It sets the tone for the AI language model to provide helpful and knowledgeable answers.

  • Lines 8–13: A prompt template is created using ChatPromptTemplate.from_messages. It consists of two messages:

    • System message: Defined above, it provides instructions and context to the AI language model.

    • Human message: Placeholder for the user’s question.

  • Line 15: An AI language model (LLM) instance is initialized using ChatOpenAI. We specify the GPT-3.5 model and set the temperature to 0 for deterministic responses.

  • Line 17: The context for generating the answer is set up by chaining the prompt template, LLM, and string output parser (StrOutputParser).

  • Lines 19–23: The context chain is invoked with the user’s question, "What is LangSmith, and why do we need it?" The response generated by the LLM is stored in the answer variable.

  • Line 25: The generated answer is printed.

8. Return the hypothetical document and original question

Finally, the hypothetical document and the original question are returned using the HyDE chain.

Python 3.10.4
chain = RunnablePassthrough.assign(hypothetical_document=context)
chain.invoke(
{
"question": "What is LangSmith, and why do we need it?"
}
)

Code explanation:

  • Line 1: A chain is created using RunnablePassthrough to pass the hypothetical document and the original question through the HyDE system.

  • Lines 3–7: The chain is invoked with a dictionary containing the user’s question, "What is LangSmith, and why do we need it?". This triggers the execution of the chain, which processes the question along with the hypothetical document.

Try it yourself

You can practice executing this codes yourself in the Jupyter Notebook below:

Please login to launch live app!