AI Web FeedsAI Web FeedsOpen web AI reader
  • Features
    Documentation

    Sentiment Analysis

    Transformer-based sentiment classification and trend tracking

    Source: apps/web/content/docs/features/sentiment-analysis.mdx

    Sentiment Analysis

    Sentiment Analysis classifies article sentiment using transformer models (DistilBERT) and tracks sentiment trends over time by topic.

    Overview

    The sentiment analyzer:

    1. Classifies article sentiment: positive, neutral, or negative
    2. Computes sentiment scores (-1.0 to +1.0)
    3. Aggregates daily sentiment by topic
    4. Detects sentiment shifts using moving averages

    Architecture

    Sentiment Classification

    Model

    Uses Hugging Face's distilbert-base-uncased-finetuned-sst-2-english:

    • Model Size: 67MB
    • Accuracy: ~92% on SST-2 benchmark
    • Inference Time: ~50ms per article (CPU)
    • Context Window: 512 tokens (truncates longer articles)

    Sentiment Score Mapping

    # Model output → Sentiment score
    "POSITIVE" (confidence 0.85) → +0.85
    "NEGATIVE" (confidence 0.92) → -0.92
    "NEUTRAL"0.0

    Classification Thresholds

    if sentiment_score > 0.3:
        classification = "positive"
    elif sentiment_score < -0.3:
        classification = "negative"
    else:
        classification = "neutral"

    Usage

    CLI Commands

    Analyze Sentiment

    ai-web-feeds nlp sentiment

    Options:

    • --batch-size: Number of articles (default: 100)
    • --force: Reprocess all articles
    # Process 50 articles
    ai-web-feeds nlp sentiment --batch-size 50
    # 30-day sentiment trend for "AI Safety"
    ai-web-feeds nlp sentiment-trend "AI Safety" --days 30

    Output:

    AI Safety - Sentiment Trend (30 days)
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Date       Avg Sentiment  Articles  Positive  Neutral  Negative
    2023-10-01    +0.45         24        18        4         2
    2023-10-02    +0.32         19        12        5         2
    2023-10-03    -0.15         28         8       12         8  ⚠️  Shift

    Detect Sentiment Shifts

    # Show topics with sentiment shifts (>0.3 change in 7-day MA)
    ai-web-feeds nlp sentiment-shifts

    Output:

    Recent Sentiment Shifts
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Topic          Previous  Current  Change  Status
    AI Safety        +0.25    -0.18    -0.43   🔴 Major shift
    AI Regulation    -0.10    +0.35    +0.45   🟢 Improving

    Compare Topics

    ai-web-feeds nlp sentiment-compare "AI Safety" "AI Capabilities"

    Shows side-by-side sentiment trends for two topics.

    Python API

    from ai_web_feeds.nlp import SentimentAnalyzer
    from ai_web_feeds.config import Settings
    
    analyzer = SentimentAnalyzer(Settings())
    
    article = {
        "id": 1,
        "title": "RLHF Concerns",
        "content": "Critics have raised serious concerns about RLHF..."
    }
    
    sentiment = analyzer.analyze_sentiment(article)
    # Returns: {
    #     "sentiment_score": -0.65,
    #     "classification": "negative",
    #     "confidence": 0.89,
    #     "model_name": "distilbert-base-uncased-finetuned-sst-2-english"
    # }

    Batch Processing

    Sentiment analysis runs hourly:

    from ai_web_feeds.nlp.scheduler import NLPScheduler
    
    nlp_scheduler = NLPScheduler(scheduler)
    nlp_scheduler.register_jobs()
    # Registers:
    # - Sentiment analysis (every hour)
    # - Sentiment aggregation (15 min after analysis)

    Database Schema

    article_sentiment Table

    CREATE TABLE article_sentiment (
        article_id INTEGER PRIMARY KEY,
        sentiment_score REAL NOT NULL CHECK(sentiment_score BETWEEN -1.0 AND 1.0),
        classification TEXT NOT NULL CHECK(classification IN ('positive', 'neutral', 'negative')),
        model_name TEXT NOT NULL,
        confidence REAL NOT NULL CHECK(confidence BETWEEN 0 AND 1),
        computed_at DATETIME DEFAULT CURRENT_TIMESTAMP,
        FOREIGN KEY (article_id) REFERENCES articles(id)
    );

    topic_sentiment_daily Table

    Aggregated daily sentiment by topic:

    CREATE TABLE topic_sentiment_daily (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        topic TEXT NOT NULL,
        date DATE NOT NULL,
        avg_sentiment REAL NOT NULL,
        article_count INTEGER NOT NULL,
        positive_count INTEGER DEFAULT 0,
        neutral_count INTEGER DEFAULT 0,
        negative_count INTEGER DEFAULT 0,
        UNIQUE(topic, date)
    );

    Sentiment Aggregation

    Daily Aggregation

    Runs 15 minutes after sentiment analysis:

    # Group sentiment scores by (topic, date)
    aggregates = {}
    for article in recent_articles:
        for topic in article.topics:
            key = (topic, article.date)
            aggregates[key]["scores"].append(article.sentiment_score)
            aggregates[key][article.classification] += 1
    
    # Compute average
    for (topic, date), data in aggregates.items:
        avg_sentiment = sum(data["scores"]) / len(data["scores"])
        storage.upsert_topic_sentiment_daily(
            topic=topic,
            date=date,
            avg_sentiment=avg_sentiment,
            article_count=len(data["scores"]),
            positive_count=data["positive"],
            neutral_count=data["neutral"],
            negative_count=data["negative"]
        )

    Shift Detection

    7-day moving average:

    def detect_shift(topic: str, threshold: float = 0.3) -> bool:
        """Detect sentiment shift using 7-day moving average"""
        trend = storage.get_topic_sentiment_trend(topic, days=14)
    
        # Compute 7-day MA for last 2 weeks
        ma_recent = mean([day.avg_sentiment for day in trend[:7]])
        ma_previous = mean([day.avg_sentiment for day in trend[7:14]])
    
        shift = abs(ma_recent - ma_previous)
        return shift > threshold

    Configuration

    class Phase5Settings(BaseSettings):
        sentiment_batch_size: int = 100
        sentiment_cron: str = "0 * * * *"  # Every hour
        sentiment_model: str = "distilbert-base-uncased-finetuned-sst-2-english"
        sentiment_shift_threshold: float = 0.3

    Environment Variables:

    PHASE5_SENTIMENT_BATCH_SIZE=100
    PHASE5_SENTIMENT_SHIFT_THRESHOLD=0.3
    PHASE5_SENTIMENT_MODEL=distilbert-base-uncased-finetuned-sst-2-english

    Performance

    • Throughput: ~100 articles/hour (CPU)
    • Memory: ~500MB (model loaded)
    • Storage: ~50 bytes per sentiment record

    Use Cases

    Monitor Topic Sentiment

    Track sentiment for specific topics:

    # Daily check for "AI Safety" sentiment
    ai-web-feeds nlp sentiment-trend "AI Safety" --days 7

    Detect Controversies

    Identify topics with negative sentiment spikes:

    # Topics with sentiment < -0.5 in last 7 days
    ai-web-feeds nlp sentiment-shifts --threshold -0.5

    Compare Competing Approaches

    # Compare sentiment for competing techniques
    ai-web-feeds nlp sentiment-compare "RLHF" "Constitutional AI"

    Model Details

    DistilBERT Architecture

    • Base Model: BERT distilled to 66M parameters (40% smaller)
    • Training: Fine-tuned on SST-2 (Stanford Sentiment Treebank)
    • Input: Max 512 tokens (articles truncated to ~2000 chars)
    • Output: Binary classification (positive/negative) with confidence

    Limitations

    1. Context Window: Only first 512 tokens considered
    2. Binary Classification: Model trained for binary sentiment (positive/negative), neutral inferred
    3. Domain Shift: SST-2 is movie reviews; AI articles may differ
    4. No Fine-tuning: Pre-trained model used as-is (no domain adaptation)

    Troubleshooting

    Low Confidence Scores

    Symptom: All sentiment predictions have low confidence (<0.6).

    Cause: Articles too long, model only sees truncated beginning.

    Solution: Increase truncation window or use extractive summarization before analysis.

    Model Download Fails

    Symptom: OSError: Can't find model

    Solution:

    # Models auto-download to ~/.cache/huggingface/hub
    # Ensure internet connection and disk space (~67MB)
    
    # Manual download:
    uv run python -c "from transformers import pipeline; pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')"

    Sentiment Shifts Not Detected

    Symptom: No shifts reported despite obvious sentiment changes.

    Cause: Threshold too high.

    Solution:

    # Lower threshold to 0.2
    export PHASE5_SENTIMENT_SHIFT_THRESHOLD=0.2

    Future Enhancements

    1. Domain-Specific Fine-tuning: Train on AI article sentiment labels
    2. Aspect-Based Sentiment: Sentiment for specific entities/topics within articles
    3. Multilingual Support: Add models for non-English content
    4. Real-Time Alerts: Webhook notifications for sentiment shifts

    See Also

    Sentiment Analysis | AI Web Feeds