Retrieval-Augmented Generation
AI technique grounding language model responses in specific retrieved documents to improve accuracy.
FAQs
How does RAG reduce hallucination in financial AI applications?
RAG reduces hallucination by providing the LLM with specific, verifiable source documents in its context window, constraining it to generate responses grounded in those documents rather than relying on potentially outdated or incorrect parametric knowledge (information encoded in model weights during training). When the system prompt instructs the model to only answer based on provided documents and to acknowledge when information isn't in the retrieved context, hallucinations are significantly reduced. Responses can include source citations (document name, page number, passage), enabling human reviewers to verify accuracy. RAG doesn't eliminate hallucination entirely—models can still misinterpret retrieved text—but it provides the verifiability foundation that pure LLM responses lack.
What is a vector database and why is it essential for RAG?
A vector database stores and indexes high-dimensional numerical vectors (embeddings) representing text chunks, images, or other data, optimized for nearest-neighbor search—finding the most similar vectors to a query vector. RAG systems convert all source documents to embeddings offline, store them in the vector database, and then at query time convert the user's question to an embedding and search the database for the most semantically similar document chunks. This semantic search finds relevant documents even when exact keyword matches don't exist—asking 'what is the policy on expense reimbursement' finds relevant documents even if they don't use those exact words. Popular vector databases include Pinecone, Weaviate, Chroma, Qdrant, and PostgreSQL with pgvector extension.
What are the limitations of RAG for financial document applications?
RAG limitations in financial contexts include: retrieval failure (if relevant documents aren't in the knowledge base, the model may hallucinate or say it doesn't know); chunking challenges (splitting long financial documents at arbitrary points may separate related context—a covenant threshold from its definition, a table from its header); cross-document reasoning difficulty (answering questions requiring synthesis across multiple documents retrieved separately); table and figure handling (standard RAG struggles with complex financial tables—specialized table extraction and formatting is required); update latency (knowledge base must be reindexed when source documents change); and precision-recall tradeoffs (retrieving too few chunks risks missing relevant content; too many chunks overwhelms the model's context window with noise).
Related Terms
Large Language Model
AI system trained on vast text data to understand and generate human language across many tasks.
Prompt Engineering
Craft of designing and optimizing inputs to AI language models to reliably produce desired outputs.
Generative AI
AI systems capable of creating new content—text, images, code, or data—based on patterns learned from training.
Fine-Tuning
Further training a pre-trained AI model on domain-specific data to improve performance on specialized tasks.