Implementing a basic Retrieval-Augmented Generation (RAG) Pipeline – Part II

By Team Algo
Reading Time: 5 minutes

By km Alankrata

Introduction

In Part-I of this blog we understood the need for RAG pipelines as well as explored some of the use cases.

In continuation, in Part-II we’ll be taking a look at how to implement a basic RAG pipeline using LangChain and OpenAI to chat with our Insurance Policy Clauses document.

The Document

First let’s take a look at a snippet of how our document looks like. This document captures the clauses applicable to a 2 Wheeler Insurance Policy.

Document Snippet

Our goal is to enable the LLM application to be able to generate responses specific to information mentioned in the document, something that the model may not have in its own knowledge base.

Setting up the Environment

The following python dependencies would be required for the implementation to execute successfully.

# document loaders
pypdf==3.16.2

# vector store
faiss-cpu==1.8.0.post1

# langchain framework
langchain-community==0.2.11
langchain-openai==0.1.21

Step 1: Importing Required Modules

Let us start by importing the python modules that we are going to need to set up this RAG pipeline.

# pipeline.py
# —– required modules
# document loading
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# embedding model
from langchain_openai.embeddings import OpenAIEmbeddings

# vector store
from langchain_community.vectorstores import FAISS

# prompt
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

# llm
from langchain_openai.chat_models import ChatOpenAI

Step 2: Setting the OpenAI API Key

For our implementation we’ll be using OpenAI’s GPT3.5-Turbo model. To gain access to this model, we would require the OpenAI API Key. You can set up your own key here.

# pipeline.py
# —– local constants
OPENAI_API_KEY = “XXX”
DOCUMENT_PATH = “./2wheelerpolicy.pdf”

Step 3: Defining Prompt

Prompt being the important aspect of how your LLM responds, we need to define a suitable prompt for our use case considering the context that we are going to retrieve from our document.

# pipeline.py

# —– define prompt
prompt_template = “””
Have a conversation with a human where you are an insurance advisor which parses the provided insurance policies and answers the questions based on it.

<instructions>
1. Answer the human questions based on the provided insurance document in a precise manner.
2. Do not answer any questions other than insurance.
3. If a human is greeting you, greet back politely.
4. Do not use any previous knowledge.
5. And answer the question in the same language format in which the question is asked.
</instructions>

<context>
{context}
</context>

Question: {question}
Answer: “””

PROMPT = PromptTemplate(template = prompt_template, input_variables = [“context”, “question”])

Step 4: Generating Embeddings

Now that we have our required modules loaded and the suitable prompt defined, let us start defining the logic to generate embeddings for our document.

This function would be segregated into 4 steps,

  • Step 1: Extract the text out of our PDF document to be used as context by our model.
  • Step 2: Divide the entire document into small chunks so that it becomes easy for the model to generate responses considering the relevant chunks of information.
  • Step 3 & 4: Initiate and generate the embeddings for the chunks and save them locally for future use.
# pipeline.py

def generate_and_save_embeddings(document_path = None):
    vectorstore = None

    try:
        # step 1: load document
        doc_loader = PyPDFLoader(document_path)
        document = doc_loader.load()

        # step 2: split the document into chunks
        text_splitter = RecursiveCharacterTextSplitter(chunk_size = 300, chunk_overlap = 75)
        texts = text_splitter.split_documents(document)

        # step 3: initiate embedding model
        model_embedding = OpenAIEmbeddings(model = “text-embedding-ada-002”, openai_api_key = OPENAI_API_KEY)

        # step 4: generate and save embeddings
        vectorstore  = FAISS.from_documents(documents = texts, embedding = model_embedding)
        vectorstore.save_local(folder_path = “./embeddings”, index_name = “policy_300_75”)

    except Exception as exc:
        print(str(exc))

    return vectorstore

Step 5: LLM Chain

In the LangChain framework, every LLM pipeline is executed as a ‘chain’. We need to define the flow of the chain which we would use later to query the LLM.

# pipeline.py

def get_context(chunks = None):
    context = “\n\n”.join([chunk.page_content for chunk in chunks])
   
    return context

def get_llm_chain(vectorstore = None):
    # step 1: initiate Chat LLM
    llm = ChatOpenAI(model = “gpt-3.5-turbo-0125”, api_key = OPENAI_API_KEY)

    # step 2: define chain based on retriever
    # query the model directly if no vectorstore is defined
    if vectorstore is None:
        chain = (llm)

    # query the model with the retriever
    else:
        # load vectorstore as retriever
        retriever = vectorstore.as_retriever(search_type = “similarity”, search_kwargs = {“k”: 3})

        # define llm chain with retriever
        chain = ({“context”: retriever | get_context , “question”: RunnablePassthrough()} | PROMPT | llm)

    return chain

Please note that we have defined 2 chains in this implementation. One to query the LLM without any retriever and the other to query the LLM with the retriever. This will enable us to compare the results of both and better understand the use of RAG as a conclusion.

Step 6: The Main Block

We have prepared all the functions required for our implementation. Let us start calling these functions into our main block so as to execute this experiment.

# pipeline.py

# —– main block
if __name__ == “__main__”:
    QUESTION = “What is the cancellation policy?”
    print(f”Question >> {QUESTION}\n”)
    # step 0: query the llm directly
    chain = get_llm_chain(vectorstore = None)
    response = chain.invoke(QUESTION)

    print(f”Response (without Retriever) >> {response.content}\n”)
   
    # step 1: generate embeddings
    vectorstore = generate_and_save_embeddings(document_path = DOCUMENT_PATH)

    # step 2: define llm chain
    chain = get_llm_chain(vectorstore = vectorstore)
    response = chain.invoke(QUESTION)

    print(f”Response (with Retriever) >> {response.content}\n”)

Comparing the responses

Question >> What is the cancellation policy?

Response (without Retriever) >> The cancellation policy refers to the terms and conditions set by a company or organization regarding the cancellation of a service, reservation, or booking. This policy typically outlines the timeframe in which cancellations can be made, any fees or penalties associated with cancellations, and any restrictions on refunds or rescheduling. It is important to review and understand the cancellation policy before making a reservation or booking to avoid any potential issues or fees in the event that you need to cancel.

Response (with Retriever) >> The Policy may be cancelled at any time by the insured on seven days’ notice by recorded delivery, and provided no claim has arisen during the policy period, the insured shall be entitled to a return of premium less premium at the short period scale of 20% for not exceeding 1 month.

As can be observed in the output snipper above, when we try to ask the LLM about cancellation policy directly, it generates a generic response out of its own knowledge base. This response does not serve our purpose of understanding the cancellation policy for our 2 Wheeler Insurance Policy.

But, when we augment our LLM with a retriever, it generates a relevant response specific to our document.

Conclusion

Looking at this implementation it can be concluded that despite the rapid progress of Large Language Models and their capabilities, the use case specific responses that are expected out of these models still require a domain specific knowledge base.

We can enable the use of language capabilities of the LLMs in our application while also ensuring that our private information is not shared with the third party models for training.

References[1] https://python.langchain.com/v0.2/docs/introduction/