Prime 5 RAG Frameworks for AI Purposes

March 20, 2025

17

RAG has turn out to be a preferred expertise in 2025, it avoids the fine-tuning of the mannequin which is dear in addition to time-consuming. There’s an elevated demand for RAG frameworks within the present state of affairs, Lets Perceive what are these. Retrieval-augmented technology (RAG) frameworks are important instruments within the area of synthetic intelligence. They improve the capabilities of Giant Language Fashions (LLMs) by permitting them to retrieve related data from exterior sources. This results in extra correct and context-aware responses. Right here, we are going to discover 5 notable RAG frameworks: LangChain, LlamaIndex, LangGraph, Haystack, and RAGFlow. Every framework provides distinctive options that may enhance your AI initiatives.

1. LangChain

LangChain is a versatile framework that simplifies the event of functions utilizing LLMs. It offers instruments for constructing RAG functions, making integration simple.

Key Options:
- Modular design for straightforward customization.
- Helps varied LLMs and information sources.
- Constructed-in instruments for doc retrieval and processing.
- Appropriate for chatbots and digital assistants.

Right here’s the hands-on:

Set up the next libraries

! pip set up langchain_community tiktoken langchain-openai langchainhub chromadb langchain

Arrange OpenAI API key and os setting

from getpass import getpass
openai = getpass("OpenAI API Key:")
import os
os.environ["OPENAI_API_KEY"] = openai

Import the next dependencies

import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

Loading the doc for RAG utilizing WebBase Loader (substitute with your individual Knowledge)

# Load Paperwork
loader = WebBaseLoader(
   web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
   bs_kwargs=dict(
       parse_only=bs4.SoupStrainer(
           class_=("post-content", "post-title", "post-header")
       )
   ),
)
docs = loader.load()

Chunking the doc utilizing RecursiveCharacterTextSplitter

# Break up
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

Storing the vector paperwork in ChromaDB

# Embed
vectorstore = Chroma.from_documents(paperwork=splits,
                                   embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

Pulling the RAG immediate from the LangChain hub and defining LLM

# Immediate
immediate = hub.pull("rlm/rag-prompt")
# LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

Processing the retrieved docs

# Publish-processing
def format_docs(docs):
   return "nn".be a part of(doc.page_content for doc in docs)

Creating the RAG chain

# Chain
rag_chain = (
    format_docs, "query": RunnablePassthrough()
   | immediate
   | llm
   | StrOutputParser()

Invoking the chain with the query

# Query
rag_chain.invoke("What's Process Decomposition?")

Output

‘Process Decomposition is a way used to interrupt down advanced duties into
smaller and less complicated steps. This strategy helps brokers to plan forward and
sort out troublesome duties extra successfully. Process decomposition could be performed
by varied strategies, together with utilizing prompting methods, task-specific
directions, or human inputs.’

Additionally Learn: Discover every little thing about LangChain Right here.

2. LlamaIndex

LlamaIndex, beforehand referred to as the GPT Index, focuses on organizing and retrieving information effectively for LLM functions. It helps builders entry and use giant datasets rapidly.

Key Options:
- Organizes information for quick lookups.
- Customizable elements for RAG workflows.
- Helps a number of information codecs, together with PDFs and SQL.
- Integrates with vector shops like Pinecone and FAISS.

Right here’s the hands-on:

Set up the next dependencies

!pip set up llama-index llama-index-readers-file
!pip set up llama-index-embeddings-openai
!pip set up llama-index-llms-openai

Import the next dependencies and initialize the LLM and embeddings

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
llm = OpenAI(mannequin="gpt-4o")
embed_model = OpenAIEmbedding()
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model

Obtain the information (You’ll be able to substitute it along with your information)

!wget 'https://uncooked.githubusercontent.com/run-llama/llama_index/essential/docs/docs/examples/information/10k/uber_2021.pdf' -O './uber_2021.pdf'

Learn the information utilizing SimpleDirectoryReader

from llama_index.core import SimpleDirectoryReader

paperwork = SimpleDirectoryReader(input_files=["/content/uber_2021.pdf"]).load_data()

Chunking the doc utilizing TokenTextSplitter

from llama_index.core.node_parser import TokenTextSplitter
splitter = TokenTextSplitter(
   chunk_size=512,
   chunk_overlap=0,
)
nodes = splitter.get_nodes_from_documents(paperwork)

Storing the vector embeddings in VectorStoreIndex

from llama_index.core import VectorStoreIndex
index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine(similarity_top_k=2)
Invoking the LLM utilizing RAG
response = query_engine.question("What's the income of Uber in 2021?")
print(response)

Output

‘The income of Uber in 2021 was $171.7 million.

3. LangGraph

LangGraph connects LLMs with graph-based information buildings. This framework is beneficial for functions that require advanced information relationships.

Key Options:
- Effectively retrieves information from graph buildings.
- Combines LLMs with graph information for higher context.
- Permits customization of the retrieval course of.

Code

Set up the next dependencies

%pip set up --quiet --upgrade langchain-text-splitters langchain-community langgraph langchain-openai

Initialise the mannequin, embeddings and Vector database

from langchain.chat_models import init_chat_model
llm = init_chat_model("gpt-4o-mini", model_provider="openai")
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(mannequin="text-embedding-3-large")
from langchain_core.vectorstores import InMemoryVectorStore
vector_store = InMemoryVectorStore(embeddings)

Import the next dependencies

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.paperwork import Doc
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import Record, TypedDict

Obtain the dataset utilizing WebBaseLoader(substitute it with your individual dataset)

# Load and chunk contents of the weblog
loader = WebBaseLoader(
   web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
   bs_kwargs=dict(
       parse_only=bs4.SoupStrainer(
           class_=("post-content", "post-title", "post-header")
       )
   ),
)
docs = loader.load()

Chunking of the doc utilizing RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)
# Index chunks
_ = vector_store.add_documents(paperwork=all_splits)

# Outline immediate for question-answering
immediate = hub.pull("rlm/rag-prompt")
Defining the State, Nodes and edges in Langgraph
Outline state for utility
class State(TypedDict):
   query: str
   context: Record[Document]
   reply: str
# Outline utility steps
def retrieve(state: State):
   retrieved_docs = vector_store.similarity_search(state["question"])
   return {"context": retrieved_docs}
def generate(state: State):
   docs_content = "nn".be a part of(doc.page_content for doc in state["context"])
   messages = immediate.invoke({"query": state["question"], "context": docs_content})
   response = llm.invoke(messages)
   return {"reply": response.content material}

Compiling the Graph

# Compile utility and check
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

Invoking the LLM for RAG

response = graph.invoke({"query": "What's Process Decomposition?"})
print(response["answer"])

Output

Process Decomposition is the method of breaking down an advanced activity into
smaller, manageable steps. This may be achieved utilizing methods like Chain
of Thought (CoT) or Tree of Ideas, which information fashions to motive step by
step or consider a number of potentialities. The objective is to simplify advanced
duties and improve understanding of the reasoning course of.

4. Haystack

Haystack is an end-to-end framework for growing functions powered by LLMs and transformer fashions. It excels in doc search and query answering.

Key Options:
- Combines doc search with LLM capabilities.
- Makes use of varied retrieval strategies for optimum outcomes.
- Presents pre-built pipelines for fast growth.
- Appropriate with Elasticsearch and OpenSearch.

Right here’s the hands-on:

Set up the next Dependencies

!pip set up haystack-ai
!pip set up "datasets>=2.6.1"
!pip set up "sentence-transformers>=3.0.0"
Import the VectorStore and initialise it
from haystack.document_stores.in_memory import InMemoryDocumentStore
document_store = InMemoryDocumentStore()

Loading the inbuilt dataset from the dataset library

from datasets import load_dataset
from haystack import Doc
dataset = load_dataset("bilgeyucel/seven-wonders", cut up="prepare")
docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]

Downloading the Embedding mannequin (you possibly can substitute it with OpenAI embeddings additionally)

from haystack.elements.embedders import SentenceTransformersDocumentEmbedder
doc_embedder = SentenceTransformersDocumentEmbedder(mannequin="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()
docs_with_embeddings = doc_embedder.run(docs)
document_store.write_documents(docs_with_embeddings["documents"])

Storing the embeddings in VectorStore

from haystack.elements.retrievers.in_memory import InMemoryEmbeddingRetriever
retriever = InMemoryEmbeddingRetriever(document_store)

Defining the immediate for RAG

from haystack.elements.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
template = [
   ChatMessage.from_user(
       """
Given the following information, answer the question.
Context:
{% for document in documents %}
   {{ document.content }}
{% endfor %}
Question: {{question}}
Answer:
"""
   )
]
prompt_builder = ChatPromptBuilder(template=template)

Initializing the LLM

from haystack.elements.mills.chat import OpenAIChatGenerator
chat_generator = OpenAIChatGenerator(mannequin="gpt-4o-mini")

Defining the Pipeline nodes

from haystack import Pipeline
basic_rag_pipeline = Pipeline()
# Add elements to your pipeline
basic_rag_pipeline.add_component("text_embedder", text_embedder)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", chat_generator)

Connecting the nodes to one another

# Now, join the elements to one another
basic_rag_pipeline.join("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.join("retriever", "prompt_builder")
basic_rag_pipeline.join("prompt_builder.immediate", "llm.messages")

Invoking the LLM utilizing RAG

query = "What does Rhodes Statue seem like?"
response = basic_rag_pipeline.run({"text_embedder": {"textual content": query}, "prompt_builder": {"query": query}})
print(response["llm"]["replies"][0].textual content)

Output

Batches: 100%

1/1 [00:00<00:00, 17.91it/s]

‘The Colossus of Rhodes, a statue of the Greek sun-god Helios, is believed to
have stood roughly 33 meters (108 ft) tall and was constructed with
iron tie bars and brass plates forming its pores and skin, stuffed with stone blocks.
Though the precise particulars of its look usually are not definitively recognized,
modern accounts recommend that it had curly hair with bronze or silver
spikes radiating like flames on the top. The statue seemingly depicted Helios
in a strong, commanding pose, probably with one hand shielding his eyes,
just like different representations of the solar god from the time. General, it
was designed to challenge power and radiance, celebrating Rhodes' victory
over its enemies.’

5. RAGFlow

RAGFlow focuses on integrating retrieval and technology processes. It streamlines the event of RAG functions.

Key Options:
- Simplifies the connection between retrieval and technology.
- Permits for tailor-made workflows to satisfy challenge wants.
- Integrates simply with varied databases and doc codecs.

Right here’s the hands-on:

Enroll on the RAGFlow after which Click on on Strive RAGFlow

Then Click on on Create Information Base

Then Go to Mannequin Suppliers and choose the LLM mannequin that you just need to use, We’re utilizing Groq right here and paste its API key.

Then Go to System Mannequin settings and choose the chat mannequin from there.

Now go to datasets and add the pdf you need, then click on on the Play button close to the Parsing standing column and look ahead to the pdf to get parsed.

Now go to the chat part create an assistant there, Give it a reputation and likewise choose the data base that you just created.

Then create a brand new chat and ask the query it should carry out RAG over your data base and reply accordingly.

Conclusion

RAG has turn out to be an essential expertise for customized enterprise datasets in latest instances, therefore the necessity for RAG frameworks has elevated drastically. Frameworks like LangChain, LlamaIndex, LangGraph, Haystack, and RAGFlow symbolize important developments in AI functions. By utilizing these frameworks, builders can create techniques that present correct and related data. As AI continues to evolve, these instruments will play an essential function in shaping clever functions.

Harsh Mishra is an AI/ML Engineer who spends extra time speaking to Giant Language Fashions than precise people. Keen about GenAI, NLP, and making machines smarter (in order that they don’t substitute him simply but). When not optimizing fashions, he’s most likely optimizing his espresso consumption. 🚀☕

Prime 5 RAG Frameworks for AI Purposes

1. LangChain

Set up the next libraries

Arrange OpenAI API key and os setting

Import the next dependencies

Chunking the doc utilizing RecursiveCharacterTextSplitter

Storing the vector paperwork in ChromaDB

Pulling the RAG immediate from the LangChain hub and defining LLM

Processing the retrieved docs

Creating the RAG chain

Invoking the chain with the query

Output

2. LlamaIndex

Set up the next dependencies

Import the next dependencies and initialize the LLM and embeddings

Obtain the information (You’ll be able to substitute it along with your information)

Learn the information utilizing SimpleDirectoryReader

Chunking the doc utilizing TokenTextSplitter

Storing the vector embeddings in VectorStoreIndex

Output

3. LangGraph

Code

Set up the next dependencies

Initialise the mannequin, embeddings and Vector database

Import the next dependencies

Obtain the dataset utilizing WebBaseLoader(substitute it with your individual dataset)

Chunking of the doc utilizing RecursiveCharacterTextSplitter

Compiling the Graph

Invoking the LLM for RAG

Output

4. Haystack

Set up the next Dependencies

Loading the inbuilt dataset from the dataset library

Downloading the Embedding mannequin (you possibly can substitute it with OpenAI embeddings additionally)

Storing the embeddings in VectorStore

Defining the immediate for RAG

Initializing the LLM

Defining the Pipeline nodes

Connecting the nodes to one another

Invoking the LLM utilizing RAG

Output

5. RAGFlow

Conclusion

Login to proceed studying and luxuriate in expert-curated content material.

Related Articles

Latest Articles