Posts

Retrieval Augmented Generation using Llama Index

Image
rag_example Part 1: Improving Fine-tuned Model using RAG ¶ 1. Download the PDF ¶ PDFs are downloaded once and saved in "pdfs" folder. To donwload from other url, uncomment the codes below In [1]: # import os # import requests # from bs4 import BeautifulSoup # # URL of the page to scrape (your provided URL) # url = 'https://www.emaanlibrary.com/book/tafseer-ibn-kathir-in-english-114-surahs-complete/?ebook-category=ruqya&latest=1' # # Send HTTP request to get the page content # response = requests.get(url) # # Parse the HTML content with BeautifulSoup # soup = BeautifulSoup(response.content, 'html.parser') # # Find all <a> tags with href links ending in .pdf # pdf_links = soup.find_all('a', href=True) # pdf_urls = [] # # Loop through all links and filter out the ones that are PDFs # for link in pdf_links: # href = link['href'] # if ...