Data Sources and Ownership
Which files and runtime stores are authoritative for the catalog, generated article library, browser state, and optional backend features.
Source: apps/web/content/docs/development/runtime-authority.mdx
This page explains which data source owns each part of the product. Not every file in the repository has the same authority.
Authority Tiers
| Tier | Canonical assets | Used for |
|---|---|---|
| Authored repository data | data/feeds.yaml, data/topics.yaml, JSON schemas | Source and taxonomy edits made by contributors |
| Derived repository data | data/feeds.enriched.yaml, exported JSON, OPML files | Generated outputs derived from the authored catalog |
| Runtime database | data/ai-web-feeds.db | Fetched entries, validation history, analytics, recommendations, and operational state |
| Generated article library | data/articles.generated.json | Standalone web browsing and post search |
| Web APIs and pages | /api/articles, /api/search, / | Reading and searching recent posts plus browsing the source catalog |
| Browser-local state | localStorage, IndexedDB | Reading state, local preferences, and device-scoped client state |
| Optional backend proxy | BACKEND_URL | Server-backed analytics and recommendations |
What Each Layer Owns
| Tier | Canonical assets | Used for |
|---|---|---|
| Authored repository truth | data/feeds.yaml, data/topics.yaml, JSON schemas | contributor-edited catalog and taxonomy |
| Derived repository assets | data/feeds.enriched.yaml, exported JSON/OPML | checked-in outputs generated from authored inputs |
| Runtime article store | feed_entries in data/ai-web-feeds.db canonical; fallback to data/aiwebfeeds.db only when the canonical file is missing | polled article rows and feed refresh state |
| Generated web corpus | data/articles.generated.json | self-contained article browse/search/read payload for the Next.js app |
| Web APIs and workspace | /api/articles, /api/search, /api/search/autocomplete, / | corpus-backed reading/search plus catalog-backed source discovery |
| Browser-local runtime | IndexedDB, localStorage, reader-local state | anonymous on-device reader preferences and article state |
| External backend proxy | BACKEND_URL-backed Python service | optional analytics logging, saved searches, and backend-proxied runtime routes |
Use the YAML catalog and topic files when you are changing the source list itself.
Derived repository data
| Context | Primary responsibility | Typical paths |
|---|---|---|
| Docs shell | static docs, content rendering, and navigation | app/docs, content/docs, docs UI components |
| Public reader workspace | catalog, article search, saved state, and reader UX | app/(home), app/feeds components, browser state helpers |
| Admin | operational dashboards, telemetry inspection, and admin session enforcement | app/admin, components/admin, lib/auth.ts, lib/admin-auth-new.ts |
| Backend proxy | route handlers that normalize browser requests and forward them to Python services | app/api/**, lib/backend.ts, anonymous identity helpers |
Treat these as bounded code ownership areas inside one deployment, not as separate deployables or packages.
What each layer owns
Authored repository truth
data/feeds.yamlis the minimal curated source registry.data/topics.yamlis the topic taxonomy.- schemas in
data/*.schema.jsondefine the validation contract for those files.
These files are the starting point for contributor changes and workflow intake.
Derived repository assets
data/feeds.enriched.yamlis derived from the authored feed catalog.- exported JSON and OPML files are downstream artifacts, not the canonical input.
Treat these as generated or regenerated outputs whenever the authored catalog changes.
Runtime database
Use the runtime database for operational behavior such as fetched entries, validation history, search state, and recommendation inputs.
Generated article library
Use data/articles.generated.json when you want to validate the standalone web experience. That file is the main post-browsing dataset for the web app.
Browser-local state
Read, saved, starred, and archived state stays in the browser. That keeps the core reading flow available without forcing end-user accounts into the product.
Optional backend proxy
The backend remains optional. It is only needed for features that require live server-side state, such as backend analytics exports.
Practical Precedence Rules
Related
The web app prefers repository catalog files in this order for source discovery:
data/feeds.enriched.yamldata/feeds.yamldata/feeds.json
The web app prefers data/articles.generated.json for article browse/search.
- If the corpus exists,
/,/api/articles, article search, and autocomplete article suggestions use it. - If the corpus is missing or empty, the feeds workspace uses
/api/feeds/posts/aggregateas a live fallback across the current source slice. - Live fetches from
/api/feeds/posts/aggregatealso remain available as a freshness overlay when a generated corpus is present.
Practical rule
Edit YAML when you are changing the catalog itself. Use the runtime database when you are inspecting operational behavior or populating feed_entries. Use the generated corpus when you are validating the reader/search experience that ships in the standalone web app.