AI Web FeedsAI Web FeedsOpen web AI reader
Documentation

Search Architecture

How the web app splits source search and post search across the checked-in catalog and generated article library.

Source: apps/web/content/docs/development/search-architecture.mdx

The public web app uses two local data sources for search:

  • the checked-in catalog for source discovery
  • the generated article library for recent-post browsing and post search

Public Search Surfaces

RouteBacking dataPurpose
/api/articlesdata/articles.generated.jsonFiltered post browsing with cursor pagination
/api/search?scope=articlesdata/articles.generated.jsonRanked post search
/api/search?scope=sourcesCatalog files under data/Source discovery
/api/search/autocompleteCatalog plus generated article libraryShared suggestions
POST /api/searchOptional backend proxyAnalytics logging only

Why the Split Exists

RouteBacking dataPurpose
/api/articlesdata/articles.generated.jsonfiltered article browse with cursor pagination
/api/search?scope=articlesdata/articles.generated.jsonranked article search across the exported corpus
/api/search?scope=sourceschecked-in catalog filessource discovery and filtering
/api/search/autocompleteshared catalog + article corpus indexfeed, article, and topic suggestions
POST /api/searchoptional backend proxyanalytics logging only

Data Flow

Shared article corpus behavior

apps/web/lib/article-corpus.ts is the shared server-only adapter for:

  • loading data/articles.generated.json
  • normalizing article filters (feed, topics, source_type, verified)
  • applying browse sort order (latest, oldest, source)
  • serving article search and autocomplete from the same normalized article rows

That means article browse, article search, and article autocomplete suggestions now share one authoritative dataset instead of mixing bounded live fetches and catalog-only matching.

Source catalog behavior

Source search still uses the repository catalog because feed discovery is an authored-data concern.

The web loader prefers these files in order:

  1. data/feeds.enriched.yaml
  2. data/feeds.yaml
  3. data/feeds.json

This keeps source discovery available even when the runtime database or corpus has not been refreshed yet.

Operational distinction

If source search looks correct but article browse/search is empty, check the generated corpus path before changing catalog files. The reader workspace depends on data/articles.generated.json, not on direct SQLite access.

Runtime Search Features

The Python runtime still has its own search stack for CLI and backend work, including SQLite FTS, autocomplete indexes, and optional embedding-based search. Those features remain useful operationally, but they are not required for the public / experience.

LayerImplementationPurpose
Full-textSQLite FTS5 virtual table + triggerssource search over stored runtime data
Autocompletetrie index built from titles and topicsfast prefix suggestions
Semanticembeddings + cosine similaritymeaning-based retrieval

Those runtime capabilities are still useful for CLI workflows, but they are no longer a prerequisite for the public / experience.

Search operations checklist

uv run ai-web-feeds corpus refresh
uv run ai-web-feeds corpus export
uv run ai-web-feeds search init
uv run ai-web-feeds search query "llm agents" --type full_text --limit 10
Search Architecture | AI Web Feeds