Retrieval-Augmented Generation (RAG)

RAG is a deployment pattern that pairs a generative model with a retrieval system. At query time, the user's question is used to find relevant documents from a knowledge base — typically embeddings stored in a vector database — and those documents are then provided to the model as context, alongside the original question, before the model generates its answer.

The pattern matters for privacy and governance because it keeps proprietary or personal data out of the model weights. The model itself remains general-purpose; the company-specific or sensitive content lives in the retrieval store and is supplied only when needed. That separation makes RAG far easier to govern than fine-tuning: data can be added, updated, or deleted from the retrieval store without retraining the model, and the retrieval store can enforce its own access controls.

But RAG is not a get-out-of-jail-free card. The retrieved documents still pass through the model's context window, meaning the model's prompt sees them and the inference provider — depending on terms — may log them. Data classification of the retrieval store, control of which sources can be retrieved per user, redaction of sensitive fields before retrieval, and clear data-flow contracts with the model provider are all still necessary.