Retrieval-Augmented Generation: Easy to use but hard to master

Introduction The Retrieval-augmented Generation (RAG) framework combines the benefits of information retrieval systems with the generative capability of large language models. RAG is particularly useful in tasks that require a deep understanding of the query to generate contextually relevant responses. RAG workflow RAG involves two main components: a document retriever and a large language model (LLM). The retriever is responsible for finding relevant documents based on the input query and the generator uses the retrieved documents and the original query to generate a response. ...

April 29, 2024 · 6 min · Pravi Devineni, PhD

Data Exploration with Python using Pandas

Install and import Pandas pip install pandas import numpy as np import pandas as pd Pandas Data Structures The core value of Pandas comes through the data structure options it provides, primarily Series (labeled, homogenously-typed, one-dimensional arrays) DataFrames (labeled, potentially heterogenously-typed, two-dimensional arrays) Pandas Series Create Series Create empty Series s = pd.Series(dtype='float64') Create Series from dictionary d = {'a': 1, 'b': 2, 'c': 3} s = pd.Series(d) Create Series from Numpy array ...

April 30, 2023 · 2 min · Pravi Devineni, PhD

GitHub Cheatsheet for Data Scientists

Git is a tool used for code management. It is open source and is very helpful for code development and collaboration. Git uses version control of code, which means every change to the code is recorded by version control in form of a database. In case of a mistake, version control allows us to go back in time, compare it to prior versions and help fix the error while causing the least amount of interruption to people who are working on that code. ...

February 28, 2023 · 4 min · Pravi Devineni, PhD