Note: this documentation is a work in progress, so we highly recommend you join the conversation on our Discord!

Intro to retrieval-augmented generation

What is RAG?

One of the biggest limitations of large language models (LLMs) and the applications built on top of them (like ChatGPT) is that they don’t know anything about you or your company. To give them access to that information, there are two options: 1) fine tuning, and 2) retrieval. For most use cases, retrieval works better.

When trying to customize an LLM application to private data, many people jump to fine tuning. This is usually not the best solution to the problem. Fine tuning is great for tuning the style and behavior of an LLM’s responses, but it isn’t very good for injecting new knowledge into the system. For injecting new knowledge into the system, retrieval is a much more effective solution.

The process of combining a knowledge retrieval system with an LLM is called retrieval-augmented generation (RAG). Conceptually, it’s a pretty simple process. When the user sends a message, you first run a search for relevant information. Then you include that information in the prompt you send to the LLM, thereby giving the LLM access to that information.

Superpowered AI is a platform for building retrieval-augmented language model applications. We offer a simple end-to-end API that includes everything you need to build production-ready LLM applications that are connected to the data they need.

Benefits of RAG

The most obvious benefit of RAG is that it lets you give an LLM access to proprietary data. For example, if you’re building a customer service bot you can use RAG to give an LLM access to your company’s customer service manual and product documentation.

An additional benefit of RAG is that it can be used to give LLMs access to up to date data. Most LLMs, like ChatGPT and GPT-4, were trained on datasets that only go up to 2021. That means these models are completely unaware of anything that has happened in the world over the last two years. Some newer models have training data cutoffs in 2023, but that still leaves them many months out of date.

With RAG, the source documents used in generating each answer can be logged and displayed to the user. That way the user doesn’t have to take the AI’s word for anything, and can verify information themselves. Clickable citations are one of the biggest benefits of RAG, and Superpowered offers this functionality natively.

By putting factual information directly into the prompt, we can also dramatically reduce the likelihood of hallucinations, one of LLMs’ greatest weaknesses.

Retrieval pipeline

Building production-quality knowledge retrieval systems is challenging, because there are a large number of steps in the process.

Data ingestion

The first step is to pull in your data and get it in the proper format. This usually means taking a file, like a PDF, and extracting the raw text from it. It could also involve converting an audio file to text using a speech-to-text AI model. Once you have the raw text from your file, the next step is to break the text into small chunks of roughly 1-2 paragraphs each. This step is known as text splitting.

Embeddings

Once you have your text split up into chunks, the next step is to create embeddings. Embeddings are vectors that represent the meaning of a piece of text. This is what we’ll use to do the search step later on.

Vector databases

Once you have your embeddings, you need somewhere to store them. This is where vector databases come in. These are specialized databases that are designed to enable efficient search based on vector similarity.