AI Web FeedsAI Web FeedsOpen web AI reader
  • Documentation

    Search Architecture

    How the web app splits source search and post search across the checked-in catalog and generated article library.

    Source: apps/web/content/docs/development/search-architecture.mdx

    The public web app uses two local data sources for search:

    • the checked-in catalog for source discovery
    • the generated article library for recent-post browsing and post search

    Public Search Surfaces

    RouteBacking dataPurpose
    /api/articlesdata/articles.generated.jsonFiltered post browsing with cursor pagination
    /api/search?scope=articlesdata/articles.generated.jsonRanked post search
    /api/search?scope=sourcesCatalog files under data/Source discovery
    /api/search/autocompleteCatalog plus generated article libraryShared suggestions
    POST /api/searchOptional backend proxyAnalytics logging only

    Why the Split Exists

    RouteBacking dataPurpose
    /api/articlesdata/articles.generated.jsonfiltered article browse with cursor pagination
    /api/search?scope=articlesdata/articles.generated.jsonranked article search across the exported corpus
    /api/search?scope=sourceschecked-in catalog filessource discovery and filtering
    /api/search/autocompleteshared catalog + article corpus indexfeed, article, and topic suggestions
    POST /api/searchoptional backend proxyanalytics logging only

    Data Flow

    Shared article corpus behavior

    apps/web/lib/article-corpus.ts is the shared server-only adapter for:

    • loading data/articles.generated.json
    • normalizing article filters (feed, topics, source_type, verified)
    • applying browse sort order (latest, oldest, source)
    • serving article search and autocomplete from the same normalized article rows

    That means article browse, article search, and article autocomplete suggestions now share one authoritative dataset instead of mixing bounded live fetches and catalog-only matching.

    Source catalog behavior

    Source search still uses the repository catalog because feed discovery is an authored-data concern.

    The web loader prefers these files in order:

    1. data/feeds.enriched.yaml
    2. data/feeds.yaml
    3. data/feeds.json

    This keeps source discovery available even when the runtime database or corpus has not been refreshed yet.

    Operational distinction

    If source search looks correct but article browse/search is empty, check the generated corpus path before changing catalog files. The reader workspace depends on data/articles.generated.json, not on direct SQLite access.

    Runtime Search Features

    The Python runtime still has its own search stack for CLI and backend work, including SQLite FTS, autocomplete indexes, and optional embedding-based search. Those features remain useful operationally, but they are not required for the public / experience.

    LayerImplementationPurpose
    Full-textSQLite FTS5 virtual table + triggerssource search over stored runtime data
    Autocompletetrie index built from titles and topicsfast prefix suggestions
    Semanticembeddings + cosine similaritymeaning-based retrieval

    Those runtime capabilities are still useful for CLI workflows, but they are no longer a prerequisite for the public / experience.

    Search operations checklist

    uv run ai-web-feeds corpus refresh
    uv run ai-web-feeds corpus export
    uv run ai-web-feeds search init
    uv run ai-web-feeds search query "llm agents" --type full_text --limit 10
    Search Architecture | AI Web Feeds