Authority & Recency Scoring

Why Document Ranking Matters

Not all documents are equally reliable or current. A prospectus from 2024 should take precedence over a marketing draft from 2022. Authority and recency scoring helps the system prioritize the most trustworthy and up-to-date information when answering your questions.

Authority Scoring

Authority scoring assigns different weights to documents based on their type and source. Official documents like prospectuses and fact sheets receive higher scores than informal documents like meeting notes or drafts.

Document Authority Tiers

Document TypeAuthorityWeight
Prospectus, SAIHighest1.0
Fact Sheet, Product DocHigh0.9
Research ReportMedium-High0.8
Investment MemoMedium0.7
PresentationMedium0.6
Internal MemoLower0.5
Draft, NotesLow0.3

How Authority is Determined

Authority is assigned during document ingestion based on:

  • File path: Documents in certain folders receive higher authority
  • File name: Keywords like “prospectus” or “fact_sheet” boost authority
  • Classification: The AI document classifier contributes to authority
  • Metadata: Explicit authority tags in document metadata

Recency Scoring

Recency scoring boosts newer documents over older ones. This is particularly important for time-sensitive information like market data, performance figures, and regulatory updates.

Recency Decay Model

Document scores decay over time using an exponential model. The decay rate is configurable but defaults to a half-life of approximately 180 days.

Today: 1.0 (full score)
6 months ago: ~0.5 (half score)
1 year ago: ~0.25 (quarter score)
2+ years ago: ~0.1 (minimum floor)

When Recency Matters Most

  • Performance data: Returns, prices, market values
  • AUM and flows: Fund sizes change frequently
  • Regulatory status: SEC approvals, compliance updates
  • Market commentary: Analysis tied to current conditions

When Recency Matters Less

  • Product structure: ETF mechanics don't change often
  • Educational content: Fundamental explanations stay valid
  • Historical analysis: Past events remain accurate

How Scores Combine

The final relevance score for a document combines three factors:

Combined Score Formula

final_score = semantic_score × authority_weight × recency_weight

For example, a highly relevant (0.8) prospectus (1.0 authority) from last month (0.95 recency) would score 0.8 × 1.0 × 0.95 = 0.76

Example Comparison

DocumentSemanticAuthorityRecencyFinal
2024 Prospectus0.751.00.90.68
2022 Research Report0.850.80.30.20
Recent Draft Memo0.900.30.950.26

In this example, the 2024 Prospectus wins despite having lower semantic similarity because its high authority and recent date outweigh the research report's slightly better semantic match.

Authority Configuration

Authority weights are configured in the assets/authority_config.json file. Administrators can adjust weights for different document types, paths, and keywords.

Configuration Options

  • path_patterns: Boost documents in specific folders
  • filename_patterns: Boost by filename keywords
  • doc_type_weights: Set weights by classification type
  • recency_half_life: Control how fast recency decays
  • min_recency_score: Floor for very old documents

Impact on Search Results

Authority and recency scoring affects which documents appear in your results and in what order. When multiple documents discuss the same topic:

  • Official documents rank higher than drafts
  • Recent documents rank higher than old ones
  • A very relevant old draft may still rank below a moderately relevant new prospectus

Related Articles