Hypothetical Document Embeddings (HyDE): Simulating Context

Explore how hypothetical document embeddings (HyDE) enhance pre-retrieval optimization in RAG systems by simulating relevant context. Learn to generate embeddings, query vector stores, and implement HyDE using LangChain with practical code examples.

We'll cover the following...

Why hypothetical document embeddings (HyDE)?
What is HyDE?
- How HyDE works
Step-by-step implementation
Try it yourself

Why hypothetical document embeddings (HyDE)?

Traditional document retrieval in RAG models relies on matching queries with existing documents in a collection. This approach faces limitations:

Limited generalizability: Existing retrieval methods often struggle with unseen domains or queries with subtle variations.
Factual accuracy: Retrieving documents based solely on keyword matching might lead to irrelevant or inaccurate information, especially for complex queries.

HyDE tackles these challenges by introducing the concept of hypothetical documents.

Educative Byte: Assume you are a student and preparing for a history test with lots of books to read. HyDE, like a smart study buddy, jumps in to lend a hand. It takes all that information and makes super helpful study notes just for you. These notes aren’t copies of the books, but they’re the most important bits you need to remember. For instance, if you’re studying World War II, HyDE might summarize the big reasons for the war, the major battles, and how it ended. HyDE’s summaries make studying much easier—you can understand the main ideas faster.

What is HyDE?

HyDE, as described in thisGao, Luyu, Xueguang Ma, Jimmy Lin, and Jamie Callan. "Precise zero-shot dense retrieval without relevance labels." arXiv preprint arXiv:2212.10496 (2022). paper by Luyu Gao, leverages LLMs to generate hypothetical document embeddings that represent ideal documents for answering a given query. These embeddings, even though not corresponding to actual documents, capture the essence of the information needed. This allows the retrieval process to focus on documents containing relevant content, leading to more accurate and informative responses.

How HyDE works

Here’s a breakdown of the HyDE workflow:

Query processing: The user submits a query.
Hypothetical document generation: HyDE utilizes an LLM to create one or more “hypothetical documents” that address the query. These documents might not be factual or complete, but they capture the information a relevant document would contain. This generation process often involves prompting the LLM with instructions like “Write a short summary of a web page that answers the question...”.
Embedding creation: Each generated hypothetical document is then converted into a numerical representation called an embedding. This embedding captures the semantic meaning of the document.
Document retrieval: The system searches for existing documents in the collection whose embeddings are most similar to the hypothetical document embeddings. This process leverages vector similarity techniques.
Response generation: The retrieved documents are fed into the RAG model’s generation stage, where they are used to create a response to the user’s query.

Python 3.10.4

import os
from langchain_openai import OpenAI
from langchain_openai import OpenAIEmbeddings
from langchain.chains import HypotheticalDocumentEmbedder
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

Code explanation

Line 1: Set the OPENAI_API_KEY variable to an empty string and assign it to the environment variable OPENAI_API_KEY using os.environ. This is where you should add your OpenAI API key.
Lines 2–3: If the OPENAI_API_KEY is still an empty string after the assignment, raise a ValueError with the message "Please set the OPENAI_API_KEY environment variable". This ensures that the API key is properly set before continuing with the program execution.

3. Load and split documents

Here, we load some example documents and prepare them for processing by the LLM. Since real-world documents might be lengthy, we’ll also perform text splitting to ensure they fit the LLM’s input limitations.

Code explanation

Lines 1-4: Initialize a list called loaders, containing instances of the TextLoader class from LangChain. These loaders are used to load text files containing the documents to be processed.
Lines 6-8: Iterate over each loader in the loaders list and load the documents using the load() method of each loader. The documents loaded from each loader are then appended to the docs list.
Line 10: Create an instance of the RecursiveCharacterTextSplitter class, specifying a chunk_size of 400 characters. This splitter class is used to split large documents into smaller, more manageable chunks.
Line 11: Call the split_documents() method of the text_splitter object with the docs list as input. This method splits each document into the docs list into smaller chunks using the specified chunk_size. The resulting split documents are then assigned back to the docs list.

4. Create a vector store

A vector store serves as a critical component for retrieval in HyDE. It allows us to store document embeddings and efficiently search for documents similar to a hypothetical document embedding.

Code explanation

Line 1: Initialize the embedding model and LLM. The HypotheticalDocumentEmbedder class combines the capabilities of an OpenAI language model (LLM) with OpenAIEmbeddings for creating embeddings, specifically for the "web_search" context.
Line 3: Define a query about LangSmith. This query string will be used to generate an embedding that represents the query in a numerical format.
Line 5: Use the embedding model to generate an embedding for the query. The embed_query method processes the query string, converting it into an embedding vector that captures the semantic meaning of the query.

Below is the implementation of multiple embedding generation.

Code explanation

Line 1: Initialize an OpenAI LLM with specific parameters. The n=3 parameter specifies generating three completions per prompt, and best_of=4 means choosing the best completion out of four attempts.
Line 3: Initialize the embedding model using the previously created LLM. The HypotheticalDocumentEmbedder class combines the capabilities of multi_llm with OpenAIEmbeddings for creating embeddings, specifically for the "web_search" context.
Line 5: Generate an embedding for a specific query. The embed_query method processes the query string "What is LangSmith, and why do we need it?", converting it into an embedding vector that captures the semantic meaning of the query.

6. Query the vector store for HyDE

Before delving into the HyDE technique, it’s essential to understand how to query the vector store to retrieve relevant information:

Python 3.10.4

system = """
As a knowledgeable and helpful research assistant, your task is to provide informative answers based on the given context.
Use your extensive knowledge base to offer clear, concise, and accurate responses to the user's inquiries.
Question: {question}
Answer:
"""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
context = prompt | llm | StrOutputParser()
answer = context.invoke(
    {
        "What is LangSmith, and why do we need it?"
    }
)
print(answer)

Code explanation

Lines 1–6: A system message is defined as a prompt template to generate informative responses based on the context. It sets the tone for the AI language model to provide helpful and knowledgeable answers.
Lines 8–13: A prompt template is created using ChatPromptTemplate.from_messages. It consists of two messages:
- System message: Defined above, it provides instructions and context to the AI language model.
- Human message: Placeholder for the user’s question.
Line 15: An AI language model (LLM) instance is initialized using ChatOpenAI. We specify the GPT-3.5 model and set the temperature to 0 for deterministic responses.
Line 17: The context for generating the answer is set up by chaining the prompt template, LLM, and string output parser (StrOutputParser).
Lines 19–23: The context chain is invoked with the user’s question, "What is LangSmith, and why do we need it?" The response generated by the LLM is stored in the answer variable.
Line 25: The generated answer is printed.

8. Return the hypothetical document and original question

Finally, the hypothetical document and the original question are returned using the HyDE chain.

1.Getting Started

2.Introduction to Retrieval-Augmented Generation (RAG)

3.Advanced RAG: Pre-Retrieval (Optimizing Indexing)

4.Advanced RAG: Pre-Retrieval (Optimizing Query)

5.Advanced RAG: Post-Retrieval Process

Mini Project

6.Conclusion

Hypothetical Document Embeddings (HyDE): Simulating Context

Why hypothetical document embeddings (HyDE)?

What is HyDE?

How HyDE works

Step-by-step implementation

1. Import necessary modules

2. Set up the OpenAI API key

3. Load and split documents

4. Create a vector store

5. Generate embeddings (single and multiple)

6. Query the vector store for HyDE

7. Generate a hypothetical document

8. Return the hypothetical document and original question

Try it yourself