Skip to main content

Citations

Citations provide transparency by linking answers to their sources. Every RAG query can include citations that reference the original documents.

Basic Usage

from praisonaiagents import Knowledge
from praisonaiagents.rag import RAG

knowledge = Knowledge()
knowledge.add("research_paper.pdf")

rag = RAG(knowledge=knowledge)
result = rag.query("What methodology was used?")

# Access citations
for citation in result.citations:
    print(f"[{citation.id}] {citation.source}")
    print(f"  Score: {citation.score:.2f}")
    print(f"  Text: {citation.text[:100]}...")

Citation Structure

Each citation contains:
@dataclass
class Citation:
    id: str           # Citation identifier (e.g., "1", "2")
    source: str       # Source document name/path
    text: str         # Relevant text snippet
    score: float      # Relevance score (0-1)
    doc_id: str       # Document identifier
    chunk_id: str     # Chunk identifier
    offset: int       # Character offset in source
    metadata: dict    # Additional metadata

Formatting Citations

Inline References

result = rag.query("What are the findings?")

# Format answer with citations appended
formatted = result.format_answer_with_citations()
print(formatted)
Output:
The study found significant improvements in performance [1].
The methodology was validated across multiple datasets [2].

Sources:
  [1] paper.pdf: The results show a 25% improvement...
  [2] paper.pdf: We validated our approach using...

Custom Formatting

def format_academic_style(result):
    """Format citations in academic style."""
    answer = result.answer
    
    # Build reference list
    refs = "\n\nReferences:\n"
    for c in result.citations:
        refs += f"[{c.id}] {c.metadata.get('author', 'Unknown')}. "
        refs += f"\"{c.source}\". {c.metadata.get('year', 'n.d.')}\n"
    
    return answer + refs

formatted = format_academic_style(result)

Controlling Citations

Enable/Disable

from praisonaiagents.rag import RAGConfig

# With citations (default)
config = RAGConfig(include_citations=True)

# Without citations
config = RAGConfig(include_citations=False)

Citation Count

Control how many sources are cited:
config = RAGConfig(
    top_k=10,        # Retrieve 10 chunks
    rerank=True,     # Rerank for relevance
    rerank_top_k=3,  # Keep top 3 for citations
)

Minimum Score

Filter low-relevance citations:
config = RAGConfig(
    min_score=0.5,  # Only include if score >= 0.5
)

Citations Without Generation

Get citations without generating an answer:
# Just retrieve and format citations
citations = rag.get_citations("What sources discuss X?")

for c in citations:
    print(f"Source: {c.source}")
    print(f"Relevance: {c.score:.2%}")
    print(f"Snippet: {c.text[:200]}")
    print()

Custom Citation Formatter

Implement your own citation formatting:
from praisonaiagents.rag import RAG, Citation
from typing import List, Dict, Any

class NumberedCitationFormatter:
    """Format citations with numbered references."""
    
    def format(
        self,
        results: List[Dict[str, Any]],
        start_id: int = 1,
    ) -> List[Citation]:
        citations = []
        for i, result in enumerate(results):
            citation = Citation(
                id=f"[{start_id + i}]",
                source=result.get("metadata", {}).get("filename", "Unknown"),
                text=result.get("text", "")[:500],
                score=result.get("score", 0.0),
                metadata=result.get("metadata", {}),
            )
            citations.append(citation)
        return citations

# Use custom formatter
rag = RAG(
    knowledge=knowledge,
    citation_formatter=NumberedCitationFormatter(),
)

Serialization

Citations can be serialized for storage or API responses:
result = rag.query("Question")

# Convert to dict
data = result.to_dict()
# {
#     "answer": "...",
#     "citations": [
#         {"id": "1", "source": "doc.pdf", "text": "...", ...},
#         ...
#     ],
#     "metadata": {...}
# }

# Convert back
from praisonaiagents.rag import RAGResult
restored = RAGResult.from_dict(data)

Best Practices

  1. Always show sources: Builds trust with users
  2. Include snippets: Let users verify relevance
  3. Link to originals: When possible, link to source documents
  4. Score transparency: Show relevance scores for power users
  5. Limit citations: 3-5 citations is usually sufficient