Retrieval-augmented generation (RAG) is a technique that retrieves relevant passages from a knowledge base and supplies them to a large language model (LLM) as context, so the model answers from trusted, current information instead of only its training data.

Why it matters

A general LLM only “knows” its training data up to a cutoff date and nothing private. RAG lets a model answer from your documents, policies, and products — and cite sources — which dramatically reduces hallucinations and makes answers traceable. It’s the most common pattern for reliable enterprise generative AI.

How it works (in brief)

Documents are ingested, split into chunks, and converted into vector embeddings stored in a vector database. At query time, the system retrieves the most relevant chunks and passes them to the LLM to generate a grounded answer.

See RAG development, the RAG Implementation Guide, and when to choose RAG vs. fine-tuning.

← Back to the AI & Data Glossary

What Is Retrieval-Augmented Generation (RAG)?

Why it matters

How it works (in brief)

What's the question you've never been able to answer?

Why it matters

How it works (in brief)

Related

What's the question you've never been able to answer?