Problem Statement That RAG Will Solve -

Let’s assume you work for a business that has lots of data in the form of PDFs, databases, Excel files, and other formats.

As an example, imagine you have many PDF files. The business says to you: “We have a huge amount of data, and it’s very hard to read word by word manually to see what content is available in which file and where. This requires a lot of manual effort.”

Now, they ask you: “Can you build an AI agent? Since we have a lot of data and many employees, we don’t want them to go through each document manually one by one — it’s too time-consuming. Instead, they should be able to ask an LLM questions and get answers from the available documents.”

Essentially, you need a way to tell ChatGPT (or any LLM) that you have a specific set of files so it can answer queries related to those documents.

The challenge is that the LLM has no prior knowledge of your private data — it’s only trained on public information.

Additionally, you have another problem: you have too much data to feed directly into the LLM, because it can only accept a limited context window.

This is a typical problem that can be solved using RAG (Retrieval-Augmented Generation).

RAG- Retrieval augmented generation, or RAG, is an architecture for optimizing the performance of an artificial intelligence (AI) model by connecting it with external knowledge bases. RAG helps large language models (LLMs) deliver more relevant responses at a higher quality.

Solution 1-

Let's we have data in pdf format - just convert all pdf file into text and provide that text to LLM as a system prompt. and it work Definitely .

Problem with Approach:

1-Cost

2-Context Window

Best Approach RAG Say- You can divide this in two phases

Indexing Phase (When User Provides The data)
Retrieval Phase (When User Chatting with data)

Note: These two are completely diff to each other

What is an indexing phase- Basically you ask user to upload the data once get data -> chunking (You can do page level it's up to you).

after chunking these data, you have used an embedding model for vector embedding. vector embedding save into a vector db. with meta data information (page no, datasets)

we converted all our data into smaller vectors and saved into db.

What is a retrieval Phase - User Give a query. we will use same embedding model to convert this query to vector embedding. then search in vector db. (similarity search) db. will return relevant chunk (thing we have 100k chunks, but we need only 2 chunks) after getting chunk just pass these chunks to LLM model with system prompt and use query. LLM will return proper response with meta data (page number and all)

Let's Code: -

1- Let's Start with Creating Our Virtual Environment I love to code in isolated environment -

Step 1- initialize

 python -m venv rag

Step 2 -Activate

rag\Scripts\activate

2- Setup Vector db.

Step 1- Creating compose file for setup vector db. (Quadrant)

Filename: docker-compose.ym

services:
  vector-db:
    image: qdrant/qdrant
    ports:
      - "6333:6333"

Step 2 -Pull and run in detached mode

docker compose up -d

3- Lang Chain - a lot of functionality give in Lang Chain -

we need to read documents from file - Lang chain have a function for this
we need to chuck or split our data - Lang chain have a function for this
we need to connect our vector db. - Lang chain have a function for this

a lot of functionality already given by Lang chain that made our life very easy 😁. we are going to use that

Step 1- Search for document Loder Lang chain and install for pdf

pip install -qU langchain-community langchain-openai pypdf langchain-qdrant
pip install dotenv

Step 2- Quickly Freez the Requirements

pip freeze > requirements.txt

Starting with Indexing Phase

Note: Downloads any pdf from internet

Step 1- We need to Load Our Document:

Create New File: index.py

#indexing data
from dotenv import load_dotenv
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore
load_dotenv()
pdf_path=Path(__file__).parent /"mongoose.pdf"

#Load this file into python program
loader=PyPDFLoader(file_path=pdf_path)
docs= loader.load()

#split the docs into smaller chunks 

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=400)
chunks = text_splitter.split_documents(documents=docs)


#  Vector Embdeding 
embedding_model = OpenAIEmbeddings(
    model="text-embedding-3-large",
    )

vector_store=QdrantVectorStore.from_documents(
    documents=chunks,
    embedding=embedding_model,
    url="http://localhost:6333",
    collection_name="learning123"
)

Step 2: Create. env

OPENAI_API_KEY="Your api key"

Step 3: run

python index.py

step 4: open vector store

http://localhost:6333/dashboard#/collections

you will see collection:

Congrats indexing phase done 🔥

Retrieval Phase

Step 1- Create chatpdf.py file

from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from openai import OpenAI
load_dotenv()

openai_client= OpenAI()

#  Vector Embdeding 
embedding_model = OpenAIEmbeddings(
    model="text-embedding-3-large",
    )

vector_db=QdrantVectorStore.from_existing_collection(
    embedding=embedding_model,
    url="http://localhost:6333",
    collection_name="learning_rag"
)

#take user iput
user_query=input("Ask Something: ")

#RElevent chunks from the vector db
search_result= vector_db.similarity_search(query=user_query)

context="\n\n\n".join([f"Page Content:{result.page_content}\nPage Number:{result.metadata['page_label']}\nFile Location:{result.metadata['source']}" for result in search_result ])

SYSTEM_PROMPT= f"""
 You are a helpful AI Assistant who answers user_query based on the available context retrived from a PDF file along with page_contents and page number.
 
 You Should only ans the user based on the following context and navigate the user to open the right page number to know more.
 
 Context:
 {context}
 
"""
response= openai_client.chat.completions.create(
     model="gpt-5",
     messages=[
          {"role":"system", "content": SYSTEM_PROMPT},
           {"role":"user", "content": user_query}
     ]
)
print(f"🤖: {response.choices[0].message.content}")

Step 2: run this file

python chatpdf.py

Step 3: Ask Query Like I am asking -

Can you share me details configuration in MongoDB?

you can see its giving me response from our document with page name where he got the details.

Congratulations! Your RAG system is fully functional!

Indexing Phase: Complete - Documents are processed and stored in vector database

Retrieval Phase: Complete - Users can now query documents intelligently

RAG Solution: Implemented - Solves the manual document search problem

Note: This implementation is synchronous and for demonstration purpose only

Will be covered in the next blog:

For production:

The system will be converted to an asynchronous implementation using Fast API.
A Redis-based queue will be used, along with Redis for caching and background task processing.