Data Sources and Ownership

This page explains which data source owns each part of the product. Not every file in the repository has the same authority.

Authority Tiers

Tier	Canonical assets	Used for
Authored repository data	`data/feeds.yaml`, `data/topics.yaml`, JSON schemas	Source and taxonomy edits made by contributors
Derived repository data	`data/feeds.enriched.yaml`, exported JSON, OPML files	Generated outputs derived from the authored catalog
Runtime database	`data/ai-web-feeds.db`	Fetched entries, validation history, analytics, recommendations, and operational state
Generated article library	`data/articles.generated.json`	Standalone web browsing and post search
Web APIs and pages	`/api/articles`, `/api/search`, `/`	Reading and searching recent posts plus browsing the source catalog
Browser-local state	`localStorage`, `IndexedDB`	Reading state, local preferences, and device-scoped client state
Optional backend proxy	`BACKEND_URL`	Server-backed analytics and recommendations

What Each Layer Owns

Tier	Canonical assets	Used for
Authored repository truth	`data/feeds.yaml`, `data/topics.yaml`, JSON schemas	contributor-edited catalog and taxonomy
Derived repository assets	`data/feeds.enriched.yaml`, exported JSON/OPML	checked-in outputs generated from authored inputs
Runtime article store	`articles` in `data/ai-web-feeds.db`	polled article rows and feed refresh state
Generated web corpus	`data/articles.generated.json`	self-contained article browse/search/read payload for the Next.js app
Web APIs and workspace	`/api/articles`, `/api/search`, `/api/search/autocomplete`, `/`	corpus-backed reading/search plus catalog-backed source discovery
Browser-local runtime	IndexedDB, localStorage, reader-local state	anonymous on-device reader preferences and article state
External backend proxy	`BACKEND_URL`-backed Python service	optional analytics logging, saved searches, and backend-proxied runtime routes

Use the YAML catalog and topic files when you are changing the source list itself.

Derived repository data

Context	Primary responsibility	Typical paths
Docs shell	static docs, content rendering, and navigation	`app/docs`, `content/docs`, docs UI components
Public reader workspace	catalog, article search, saved state, and reader UX	`app/(home)`, `app/feeds` components, browser state helpers
Admin	operational dashboards, telemetry inspection, and admin session enforcement	`app/admin`, `components/admin`, `lib/auth.ts`, `lib/admin-auth-new.ts`
Backend proxy	route handlers that normalize browser requests and forward them to Python services	`app/api/**`, `lib/backend.ts`, anonymous identity helpers

Treat these as bounded code ownership areas inside one deployment, not as separate deployables or packages.

What each layer owns

Authored repository truth

data/feeds.yaml is the minimal curated source registry.
data/topics.yaml is the topic taxonomy.
schemas in data/*.schema.json define the validation contract for those files.

These files are the starting point for contributor changes and workflow intake.

Derived repository assets

data/feeds.enriched.yaml is derived from the authored feed catalog.
exported JSON and OPML files are downstream artifacts, not the canonical input.

Treat these as generated or regenerated outputs whenever the authored catalog changes.

Runtime database

Use the runtime database for operational behavior such as fetched entries, validation history, search state, and recommendation inputs.

Generated article library

Use data/articles.generated.json when you want to validate the standalone web experience. That file is the main post-browsing dataset for the web app.

Browser-local state

Read, saved, starred, and archived state stays in the browser. That keeps the core reading flow available without forcing end-user accounts into the product.

Optional backend proxy

The backend remains optional. It is only needed for features that require live server-side state, such as backend analytics exports.

Practical Precedence Rules

The web app prefers repository catalog files in this order for source discovery:

data/feeds.enriched.yaml
data/feeds.yaml
data/feeds.json

The web app prefers data/articles.generated.json for article browse/search.

If the corpus exists, /, /api/articles, article search, and autocomplete article suggestions use it.
If the corpus is missing or empty, the feeds workspace shows corpus health and can load a bounded live sample for the current source slice.
Live fetches from /api/feeds/posts/aggregate also remain available as a freshness overlay when a generated corpus is present.

Practical rule

Edit YAML when you are changing the catalog itself. Use the runtime database when you are inspecting operational behavior or populating articles. Use the generated corpus when you are validating the reader/search experience that ships in the standalone web app.

Data Sources and Ownership

On this page