AI Web FeedsAI Web FeedsOpen web AI reader
  • Documentation

    Database Setup

    Database architecture, models, and operations

    Source: apps/web/content/docs/development/database.mdx

    Database Setup

    AI Web Feeds uses SQLModel (SQLAlchemy + Pydantic) for database operations with Alembic for migrations.

    Database Schema

    feed_sources Table

    Core feed metadata and configuration:

    • Core fields: id, feed, site, title
    • Classification: source_type, mediums, tags
    • Topics: topics, topic_weights
    • Metadata: language, format, updated, last_validated, verified, contributor
    • Curation: curation_status, curation_since, curation_by, quality_score, curation_notes
    • Provenance: provenance_source, provenance_from, provenance_license
    • Discovery: discover_enabled, discover_config
    • Relations: relations, mappings (JSON fields)

    articles Table

    Individual articles:

    • Identifiers: id, feed_id (foreign key), guid_hash, link_hash, canonical_url
    • Content: title, link, summary, content, author
    • Timestamps: pub_date, updated_at, first_seen_at, last_seen_at, created_at
    • Taxonomy and ingress metadata: topics, raw_categories, source_topics, extra_data

    feed_fetch_logs Table

    Fetch attempt tracking:

    • Fetch info: fetched_at, fetch_url, success
    • Response: status_code, content_type, content_length, etag, last_modified
    • Errors: error_message, error_type
    • Stats: items_found, items_new, items_updated, fetch_duration_ms
    • Data: response_headers, extra_data (JSON fields)

    topics Table

    Topic definitions:

    • Core: id, name, description, parent_id
    • Metadata: aliases, related_topics
    • Timestamps: created_at, updated_at

    Python API

    Initialize Database

    from ai_web_feeds.storage import DatabaseManager
    
    # Initialize database
    database_url = "sqlite:///data/ai-web-feeds.db"
    upgrade_database_to_head(database_url)
    db = DatabaseManager(database_url)

    Add Feed Sources

    from ai_web_feeds.models import FeedSource, SourceType
    
    feed = FeedSource(
        id="example-blog",
        feed="https://example.com/feed.xml",
        site="https://example.com",
        title="Example Blog",
        source_type=SourceType.BLOG,
        topics=["ml", "nlp"],
        verified=True,
    )
    
    db.add_feed_source(feed)

    Query Feed Sources

    # Get all feeds
    all_feeds = db.get_all_feed_sources()
    
    # Get specific feed
    feed = db.get_feed_source("example-blog")
    
    # Get all topics
    topics = db.get_all_topics()

    Bulk Operations

    # Bulk insert feed sources
    db.bulk_insert_feed_sources(feed_sources)
    
    # Bulk insert topics
    db.bulk_insert_topics(topics)

    Database Migrations

    Initialize Alembic

    # Run initialization script
    uv run python packages/ai_web_feeds/scripts/init_alembic.py

    Create Migration

    cd packages/ai_web_feeds
    alembic revision --autogenerate -m "Initial schema"

    Apply Migrations

    # Upgrade to latest
    alembic upgrade head
    
    # Downgrade one version
    alembic downgrade -1
    
    # Show current version
    alembic current

    Configuration

    Environment Variables

    # Database URL
    export AIWF_DATABASE_URL=sqlite:///data/ai-web-feeds.db
    
    # For PostgreSQL
    export AIWF_DATABASE_URL=postgresql://user:pass@localhost/aiwebfeeds
    
    # For MySQL
    export AIWF_DATABASE_URL=mysql://user:pass@localhost/aiwebfeeds

    Database Manager Options

    # Custom database URL
    db = DatabaseManager("postgresql://localhost/aiwebfeeds")
    
    # Enable SQL echo for debugging
    from sqlalchemy import create_engine
    engine = create_engine(
        "sqlite:///data/ai-web-feeds.db",
        echo=True  # Print all SQL statements
    )

    Models Reference

    All models are defined using SQLModel, which combines SQLAlchemy and Pydantic for type-safe database operations with automatic validation.

    Core Models (models.py):

    • FeedSource - Feed metadata and configuration
    • ArticleEntry - Individual articles from feed polling
    • FeedFetchLog - Fetch attempt history
    • TopicNode - v3 topic taxonomy

    Advanced Models (models_advanced.py):

    • FeedValidationHistory - Validation tracking over time
    • FeedHealthMetric - Health scores and metrics
    • DataQualityMetric - Multi-dimensional quality tracking
    • ContentEmbedding - Semantic search embeddings
    • TopicRelationship - Computed topic associations
    • UserFeedPreference - User interactions and preferences
    • AnalyticsCacheEntry - Computed analytics caching

    Next Steps

    Database Setup | AI Web Feeds