Few-Shot Learning
AI technique providing a small number of task examples in the prompt to guide model performance.
FAQs
How many examples are needed for effective few-shot prompting?
The optimal number of examples depends on task complexity, model capability, and available context window. Simple, well-defined tasks (straightforward classification, standard format extraction) often work well with 2–3 examples. More complex tasks (multi-step reasoning, nuanced classification, format with many fields) benefit from 5–10 examples. Very long documents reduce the practical number of examples because each example consumes significant context window. Research shows diminishing returns beyond 8–10 examples in most cases—more examples rarely improve performance proportionally and may introduce conflicting patterns. The quality and representativeness of examples matters far more than the exact number; a few diverse, precisely specified examples outperform many similar or ambiguous ones.
What makes a good few-shot example for financial document extraction?
Effective few-shot examples for financial extraction have: exact input text (the same document structure the model will encounter), precisely formatted output (exactly the JSON structure, field names, value formats expected), handling of ambiguous cases (demonstrating what to do when a field isn't present or is unclear—null vs. 'N/A' vs. excluding the field), numerical formatting specifications (number of decimal places, handling of thousands separators, currency symbols), and representation of edge cases likely to appear in production (non-standard date formats, negative values, scaled values where '2.5B' means 2,500,000,000). Examples that don't represent real edge cases provide false confidence and fail at deployment time when those cases appear.
Can few-shot learning work for classifying financial risk factors?
Yes—few-shot learning is effective for financial risk factor classification, particularly when categories are well-defined and examples clearly distinguish between them. For a model classifying 10-K risk factor paragraphs by type (market risk, credit risk, operational risk, regulatory risk, etc.), providing 2–3 labeled examples per category with clear, representative text demonstrating each category's defining characteristics improves classification accuracy substantially. The key challenge is: categories may have fuzzy boundaries (a paragraph can involve both market and credit risk), real examples may not be cleanly representative, and the model's training data may not align well with the company's specific taxonomy. Human-reviewed 'gold standard' examples that clearly represent each category are worth the investment for production systems.
Related Terms
Zero-Shot Learning
AI capability to perform tasks on categories or domains not seen during training, using semantic knowledge.
Prompt Engineering
Craft of designing and optimizing inputs to AI language models to reliably produce desired outputs.
Fine-Tuning
Further training a pre-trained AI model on domain-specific data to improve performance on specialized tasks.
Large Language Model
AI system trained on vast text data to understand and generate human language across many tasks.