Tuesday, March 25, 2025

A Coding Implementation to Construct a Conversational Analysis Assistant with FAISS, Langchain, Pypdf, and TinyLlama-1.1B-Chat-v1.0


RAG-powered conversational analysis assistants tackle the constraints of conventional language fashions by combining them with info retrieval programs. The system searches by way of particular data bases, retrieves related info, and presents it conversationally with correct citations. This method reduces hallucinations, handles domain-specific data, and grounds responses in retrieved textual content. On this tutorial, we’ll show constructing such an assistant utilizing the open-source mannequin TinyLlama-1.1B-Chat-v1.0 from Hugging Face, FAISS from Meta, and the LangChain framework to reply questions on scientific papers.

First, let’s set up the mandatory libraries:

!pip set up langchain-community langchain pypdf sentence-transformers faiss-cpu transformers speed up einops

Now, let’s import the required libraries: 

import os
import torch
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain_community.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import pandas as pd 
from IPython.show import show, Markdown

We’ll mount drive to avoid wasting the paper in additional step:

from google.colab import drive
drive.mount('/content material/drive')
print("Google Drive mounted")

For our data base, we’ll use PDF paperwork of scientific papers. Let’s create a perform to load and course of these paperwork:

def load_documents(pdf_folder_path):
    paperwork = []


    if not pdf_folder_path:
        print("Downloading a pattern paper...")
        !wget -q https://arxiv.org/pdf/1706.03762.pdf -O consideration.pdf
        pdf_docs = ["attention.pdf"]
    else:
        pdf_docs = [os.path.join(pdf_folder_path, f) for f in os.listdir(pdf_folder_path)
                   if f.endswith('.pdf')]


    print(f"Discovered {len(pdf_docs)} PDF paperwork")


    for pdf_path in pdf_docs:
        attempt:
            loader = PyPDFLoader(pdf_path)
            paperwork.lengthen(loader.load())
            print(f"Loaded: {pdf_path}")
        besides Exception as e:
            print(f"Error loading {pdf_path}: {e}")


    return paperwork




paperwork = load_documents("")

Subsequent, we have to cut up these paperwork into smaller chunks for environment friendly retrieval:

def split_documents(paperwork):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
    )
    chunks = text_splitter.split_documents(paperwork)
    print(f"Break up {len(paperwork)} paperwork into {len(chunks)} chunks")
    return chunks


chunks = split_documents(paperwork)

We’ll use sentence-transformers to create vector embeddings for our doc chunks:

def create_vector_store(chunks):
    print("Loading embedding mannequin...")
    embedding_model = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2",
        model_kwargs={'machine': 'cuda' if torch.cuda.is_available() else 'cpu'}
    )


    print("Creating vector retailer...")
    vector_store = FAISS.from_documents(chunks, embedding_model)
    print("Vector retailer created efficiently!")
    return vector_store


vector_store = create_vector_store(chunks)

Now, let’s load an open-source language mannequin to generate responses. We’ll use TinyLlama, which is sufficiently small to run on Colab however nonetheless highly effective sufficient for our job:

def load_language_model():
    print("Loading language mannequin...")
    model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"


    attempt:
        import subprocess
        print("Putting in/updating bitsandbytes...")
        subprocess.check_call(["pip", "install", "-U", "bitsandbytes"])
        print("Efficiently put in/up to date bitsandbytes")
    besides:
        print("Couldn't replace bitsandbytes, will proceed with out 8-bit quantization")


    from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
    import torch


    tokenizer = AutoTokenizer.from_pretrained(model_id)


    if torch.cuda.is_available():
        attempt:
            quantization_config = BitsAndBytesConfig(
                load_in_8bit=True,
                llm_int8_threshold=6.0,
                llm_int8_has_fp16_weight=False
            )


            mannequin = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.bfloat16,
                device_map="auto",
                quantization_config=quantization_config
            )
            print("Mannequin loaded with 8-bit quantization")
        besides Exception as e:
            print(f"Error with quantization: {e}")
            print("Falling again to plain mannequin loading with out quantization")
            mannequin = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.bfloat16,
                device_map="auto"
            )
    else:
        mannequin = AutoModelForCausalLM.from_pretrained(
            model_id,
            torch_dtype=torch.float32,
            device_map="auto"
        )


    pipe = pipeline(
        "text-generation",
        mannequin=mannequin,
        tokenizer=tokenizer,
        max_length=2048,
        temperature=0.2,
        top_p=0.95,
        repetition_penalty=1.2,
        return_full_text=False
    )


    from langchain_community.llms import HuggingFacePipeline
    llm = HuggingFacePipeline(pipeline=pipe)
    print("Language mannequin loaded efficiently!")
    return llm


llm = load_language_model()

Now, let’s construct our assistant by combining the vector retailer and language mannequin:

def format_research_assistant_output(question, response, sources):
    output = f"n{'=' * 50}n"
    output += f"USER QUERY: {question}n"
    output += f"{'-' * 50}nn"
    output += f"ASSISTANT RESPONSE:n{response}nn"
    output += f"{'-' * 50}n"
    output += f"SOURCES REFERENCED:nn"


    for i, doc in enumerate(sources):
        output += f"Supply #{i+1}:n"
        content_preview = doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content
        wrapped_content = textwrap.fill(content_preview, width=80)
        output += f"{wrapped_content}nn"


    output += f"{'=' * 50}n"
    return output


import textwrap


research_assistant = create_research_assistant(vector_store, llm)


test_queries = [
    "What is the key idea behind the Transformer model?",
    "Explain self-attention mechanism in simple terms.",
    "Who are the authors of the paper?",
    "What are the main advantages of using attention mechanisms?"
]


for question in test_queries:
    response, sources = research_assistant(question, return_sources=True)
    formatted_output = format_research_assistant_output(question, response, sources)
    print(formatted_output)

On this tutorial, we constructed a conversational analysis assistant utilizing Retrieval-Augmented Technology with open-source fashions. RAG enhances language fashions by integrating doc retrieval, decreasing hallucination, and guaranteeing domain-specific accuracy. The information walks by way of organising the surroundings, processing scientific papers, creating vector embeddings utilizing FAISS and sentence transformers, and integrating an open-source language mannequin like TinyLlama. The assistant retrieves related doc chunks and generates responses with citations. This implementation permits customers to question a data base, making AI-powered analysis extra dependable and environment friendly for answering domain-specific questions.


Right here is the Colab Pocket book. Additionally, don’t overlook to comply with us on Twitter and be part of our Telegram Channel and LinkedIn Group. Don’t Neglect to hitch our 85k+ ML SubReddit.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles