---
name: api-cost-monitoring
description: API service cost monitoring with automated alerting and reporting
tags: [monitoring, alerts, costs, automation, api]
complexity: intermediate
prerequisites: [python, sqlite, cron]
platforms: [linux, macos]
last_validated: 2026-06-14
---

# API Cost Monitoring

Automated monitoring system for API service costs with historical tracking, alerting, and reporting. Built around polling service APIs, storing historical data, and generating intelligent alerts based on usage patterns and thresholds.

## When to Use

- Monitor API service costs (OpenRouter, AWS, Azure, etc.)
- Track usage trends and forecast spend
- Automated alerting when approaching budget limits  
- Generate regular usage reports
- Prevent service interruption from depleted credits

## Core Pattern

### 1. API Integration
```python
def check_service_credits():
    """Poll service API for current usage/credits"""
    headers = {'Authorization': f'Bearer {api_key}'}
    response = requests.get(endpoint, headers=headers)
    return {
        'total_credits': float(data['credits_total']),
        'usage': float(data['usage']),
        'remaining': total_credits - usage,
        'timestamp': datetime.now().isoformat()
    }
```

### 2. Historical Storage
```sql
-- SQLite schema for cost tracking
CREATE TABLE credit_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp TEXT NOT NULL,
    total_credits REAL NOT NULL,
    usage REAL NOT NULL,
    remaining REAL NOT NULL,
    cost_usd REAL NOT NULL
);

CREATE TABLE alerts (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp TEXT NOT NULL,
    alert_type TEXT NOT NULL,
    message TEXT NOT NULL,
    credits_remaining REAL NOT NULL
);
```

### 3. Cron Job Automation
Use Hermes cron jobs with `no_agent=true` for efficient execution.

**CRITICAL**: Hermes cron jobs don't support script parameters. Create wrapper scripts instead:

```python
# emma_token_daily.py - wrapper script
#!/usr/bin/env python3
import subprocess
import sys
result = subprocess.run([sys.executable, '/path/to/openrouter_monitor.py', 'daily'])
sys.exit(result.returncode)
```

```bash
# Working approach - wrapper scripts
hermes cron create --no-agent --script="emma_token_daily.py" --schedule="0 8 * * *"
hermes cron create --no-agent --script="emma_token_weekly.py" --schedule="0 8 * * 1"
hermes cron create --no-agent --script="emma_token_alerts.py" --schedule="every 2h"

# BROKEN approach - parameters fail
hermes cron create --no-agent --script="cost_monitor.py daily" --schedule="0 8 * * *"  # ❌ FAILS

# Also BROKEN approach - updating existing jobs with parameters
hermes cron update [job_id] --script="script.py daily"  # ❌ FAILS SAME WAY
```

**WRAPPER SCRIPT DISCOVERY**: This was discovered when cron jobs kept failing silently. The Hermes scheduler cannot pass arguments to scripts, only execute them directly. The wrapper pattern is the ONLY reliable way to run scripts with different modes from cron.

### Wrapper Script Pattern
Since Hermes cron jobs don't support script parameters, create wrapper scripts:

```python
# ~/.hermes/scripts/emma_token_daily.py
#!/usr/bin/env python3
import subprocess
import sys
import os

script_dir = os.path.dirname(os.path.abspath(__file__))
main_script = os.path.join(script_dir, 'openrouter_monitor.py')

result = subprocess.run([sys.executable, main_script, 'daily'], 
                       cwd=script_dir, capture_output=False)
sys.exit(result.returncode)
```

```python  
# ~/.hermes/scripts/emma_token_weekly.py
#!/usr/bin/env python3
import subprocess
import sys
import os

script_dir = os.path.dirname(os.path.abspath(__file__))
main_script = os.path.join(script_dir, 'openrouter_monitor.py')

result = subprocess.run([sys.executable, main_script, 'weekly'],
                       cwd=script_dir, capture_output=False)
sys.exit(result.returncode)
```

```python
# ~/.hermes/scripts/emma_token_alert.py  
#!/usr/bin/env python3
import subprocess
import sys
import os

script_dir = os.path.dirname(os.path.abspath(__file__))
main_script = os.path.join(script_dir, 'openrouter_monitor.py')

result = subprocess.run([sys.executable, main_script, 'alert_check'],
                       cwd=script_dir, capture_output=False)
sys.exit(result.returncode)
```

### Alert Logic
```python
def check_and_send_alert(credit_info, threshold=5.0):
    """Smart alerting - avoid spam with daily deduplication"""
    remaining = credit_info['remaining']
    
    if remaining <= threshold:
        # Check if alert already sent today
        today = datetime.now().date().isoformat()
        conn = sqlite3.connect(DB_PATH)
        cursor = conn.cursor()
        
        cursor.execute('''
            SELECT COUNT(*) FROM alerts 
            WHERE DATE(timestamp) = ? AND alert_type = 'low_credits'
        ''', (today,))
        
        if cursor.fetchone()[0] == 0:  # No alert sent today
            send_alert(remaining)
            log_alert_to_db(remaining)
```

### 5. Trend Analysis & Forecasting
```python
def get_weekly_stats():
    """Calculate usage trends and forecast remaining days"""
    week_ago = (datetime.now() - timedelta(days=7)).isoformat()
    
    cursor.execute('''
        SELECT timestamp, usage, remaining FROM credit_history 
        WHERE timestamp >= ? ORDER BY timestamp ASC
    ''', (week_ago,))
    
    rows = cursor.fetchall()
    if len(rows) < 2:
        return None
        
    weekly_usage = rows[-1][1] - rows[0][1]  # Usage delta
    daily_avg = weekly_usage / 7
    days_remaining = rows[-1][2] / daily_avg if daily_avg > 0 else float('inf')
    
    return {
        'weekly_usage': weekly_usage,
        'daily_average': daily_avg,
        'days_remaining': days_remaining,
        'monthly_projection': daily_avg * 30
    }
```

## Implementation Steps

### Setup Phase
1. **API Key Configuration**: Store securely in Hermes config.yaml
2. **Database Initialization**: Create SQLite schema in ~/.hermes/data/
3. **Test Connection**: Verify API access and data parsing
4. **Setup Script**: Create guided setup for API key entry

### Core Script Development
1. **Modular Functions**: Separate API calls, DB operations, reporting
2. **Error Handling**: Graceful failures with descriptive error messages
3. **Command Line Interface**: Support daily/weekly/alert_check modes
4. **Configuration**: Easily adjustable thresholds and settings

### Automation Setup
1. **Cron Jobs**: Set up automated schedules (daily, weekly, alerts)
2. **Message Delivery**: Route to appropriate chat platform
3. **Alert Deduplication**: Prevent spam with date-based checks
4. **Monitoring**: Verify cron jobs are running correctly

### Report Templates
```python
# Daily report template
def generate_daily_report(credit_info):
    remaining = credit_info['remaining']
    usage = credit_info['usage'] 
    total = credit_info['total_credits']
    
    status = "🔴 CRITICO" if remaining <= 5 else \
             "🟠 ATTENZIONE" if remaining <= 15 else \
             "🟡 MODERATO" if remaining <= 30 else "🟢 BUONO"
             
    return f"""📊 **REPORT GIORNALIERO**
💰 **Crediti rimanenti**: ${remaining:.2f}
📈 **Totale utilizzato**: ${usage:.2f} ({usage/total*100:.1f}%)
🚦 **Status**: {status}
⏰ **Aggiornato**: {datetime.now().strftime('%d/%m/%Y %H:%M')}"""
```

## Service-Specific Configurations

### OpenRouter
- **API Endpoint**: `https://openrouter.ai/api/v1/auth/key`
- **Auth**: Bearer token in Authorization header
- **Rate Limits**: Reasonable for monitoring (not real-time polling)
- **Data Structure**: `credits_total`, `usage` fields

### Other Services
- Adapt the API integration section for different service endpoints
- Modify data parsing based on API response structure
- Adjust threshold values based on service cost structure

## Pitfalls

### API Key Security
- ❌ **Never hardcode API keys** in scripts
- ✅ Store in Hermes config.yaml under providers section
- ✅ Test API key validity during setup
- ✅ Handle API authentication failures gracefully

### Database Management
- ❌ Don't assume database exists - initialize on first run
- ✅ Use transactions for data integrity
- ✅ Handle SQLite file permissions correctly
- ✅ Consider database backup for long-term monitoring

### Alert Fatigue
- ❌ Don't send duplicate alerts for same threshold breach
- ✅ Use date-based deduplication
- ✅ Escalate alerts only when status worsens
- ✅ Provide actionable information in alert messages

### Cron Job Configuration
- ❌ Don't use agent-based cron jobs for simple monitoring
- ❌ **Don't use script parameters in Hermes cron jobs** - `script="monitor.py daily"` will fail
- ✅ Use `no_agent=true` for efficiency
- ✅ **Create wrapper scripts** for different modes instead of parameters
- ✅ Test cron scripts manually before automation
- ✅ Monitor cron job execution with `hermes cron list`

### Forecasting Accuracy
- ❌ Don't make predictions with insufficient data (< 7 days)
- ✅ Use realistic time windows for trend analysis
- ✅ Handle edge cases (zero usage, irregular patterns)
- ✅ Present forecasts as estimates, not guarantees

## Success Metrics

- **Reliability**: Consistent daily/weekly reports without failures
- **Accuracy**: Forecasting within reasonable margin (±20% monthly)
- **Timeliness**: Alerts arrive before service interruption
- **Clarity**: Reports provide actionable insights
- **Automation**: Minimal manual intervention required

## Extensions

- **Multi-Service**: Monitor multiple APIs in single system
- **Dashboard**: Web interface for historical data visualization  
- **Slack/Teams**: Integration with team communication platforms
- **Budget Planning**: Integration with expense tracking systems
- **Webhook Alerts**: Real-time notifications for critical thresholds

## Multi-LLM Analysis Strategy

### Cost-Aware Intelligence
When budget allows, use LLM analysis for richer insights:

```python
def get_smart_analysis(credit_info, budget_mode="auto"):
    """Adaptive analysis based on account status and budget"""
    if budget_mode == "free_only" or credit_info['remaining'] < 0:
        return get_offline_analysis(credit_info)
    elif budget_mode == "auto":
        return get_llm_analysis_free(credit_info)  # Use free models
    else:
        return get_llm_analysis_premium(credit_info)  # Use paid models
```

### Free LLM Integration
```python
def get_llm_analysis_free(credit_info, mode="daily"):
    """Analysis using free/cheap LLMs like DeepSeek"""
    prompt = f"""Analyze API credits for executive report:
    Remaining: ${credit_info['remaining']:.2f}
    Usage: {usage_percent:.1f}%
    Provide concise analysis (2-3 sentences) with status and recommendations."""
    
    payload = {
        'model': 'deepseek/deepseek-chat',  # Free model
        'messages': [{'role': 'user', 'content': prompt}],
        'max_tokens': 150,
        'temperature': 0.3
    }
    
    # Always fall back to offline analysis on failure
    try:
        response = requests.post(api_endpoint, json=payload, timeout=20)
        if response.status_code == 200:
            return f"🤖 {response.json()['choices'][0]['message']['content']}"
    except:
        pass
    
    return get_offline_analysis(credit_info, mode)
```

### Agent Specialization
For automated monitoring agents that should minimize costs:
- **Agent Branding**: Use different persona/name for monitoring agent (e.g., "Emma Token" vs "Emma") 
- **Model Restriction**: Configure monitoring agents to use only free models
- **Fallback Chain**: Free LLM → Offline analysis → Basic status
- **Cost Tracking**: Monitor the monitor's own API usage

## Emergency Response Patterns

### Cost Spike Detection
When users report unexpected high costs, immediate response protocol:

```python
def emergency_cost_check():
    """Emergency cost check when user reports unexpected charges"""
    # 1. Get current real-time numbers
    current_data = check_service_credits()
    
    # 2. Compare with last saved reading
    last_reading = get_last_db_record()
    
    # 3. Calculate rate of change
    time_diff = parse(current_data['timestamp']) - parse(last_reading['timestamp'])
    cost_diff = current_data['usage'] - last_reading['usage']
    hourly_rate = cost_diff / (time_diff.total_seconds() / 3600)
    
    print(f"🚨 COST SPIKE ANALYSIS")
    print(f"📊 Usage jump: ${cost_diff:.2f} in {time_diff}")
    print(f"📈 Hourly rate: ${hourly_rate:.2f}/hour")
    
    # 4. Identify rapid cost increases
    if hourly_rate > 5.0:  # > $5/hour is abnormal
        return "emergency_detected"
    return "normal_usage"
```

### Emergency Pause Protocol
When unexpected costs detected:

1. **PAUSE ALL CRON JOBS IMMEDIATELY** - Stop automated spending
```bash
hermes cron list | grep "Emma Token" | cut -d' ' -f1 | xargs -I {} hermes cron pause {}
```

2. **Audit active processes** - Check what's consuming credits
3. **Review recent deployments** - Look for new agents or scripts 
4. **Test "free" model claims** - Verify any LLM calls are actually free

### Cost Monitoring Anti-Patterns

**❌ DEADLY ANTI-PATTERN: Cost monitoring that costs money**
```python
# This creates RECURSIVE COSTS - monitoring spend consumes the budget!
def generate_daily_report():
    credits = get_credits()
    analysis = expensive_llm_call(credits)  # ❌ Costs $0.83 per report
    return analysis

# Every monitoring run reduces the budget it's supposed to protect
# Result: False alerts about "disappearing credits" that the monitor itself consumed
```

**✅ CORRECT PATTERN: Zero-cost monitoring** 
```python
def generate_daily_report():
    credits = get_credits()
    analysis = offline_analysis(credits)  # ✅ Free, rule-based analysis
    return analysis

# Optional: Use free models ONLY if verified to be actually free
def enhanced_analysis(credits):
    try:
        # Test this endpoint thoroughly before using
        return free_llm_analysis(credits)  
    except:
        return offline_analysis(credits)  # Always fall back to free
```

### Multi-Threshold System
```python
ALERT_THRESHOLDS = {
    'critical': 0,      # Negative balance
    'urgent': 5,        # Very low
    'warning': 15,      # Low
    'info': 30          # Medium
}

def determine_alert_level(remaining):
    for level, threshold in ALERT_THRESHOLDS.items():
        if remaining <= threshold:
            return level
    return 'ok'

def check_and_send_alert(credit_info):
    alert_level = determine_alert_level(credit_info['remaining'])
    
    if alert_level != 'ok':
        # Only escalate - don't re-alert at same level
        if not alert_sent_today_at_level_or_higher(alert_level):
            send_contextual_alert(alert_level, credit_info)
```

### Smart Deduplication
Prevent alert spam while ensuring escalation:
- One alert per day per alert type (low_credits, negative_credits)
- Escalation allowed when situation worsens
- Different message styles for different urgency levels
- Include direct action links in alerts

## Pitfalls From Production

### LLM Integration Failures
- ❌ **Don't assume LLM calls always work** - network, rate limits, API changes
- ❌ **CRITICAL: "Free" models often aren't free** - `deepseek/deepseek-chat` costs money despite name
- ❌ **Don't create recursive costs** - monitoring costs should never consume the budget being monitored
- ✅ Always implement offline analysis fallback
- ✅ Handle partial responses and malformed JSON gracefully  
- ✅ **TEST every "free" model with real API calls** before production deployment
- ✅ **Prefer offline analysis for cost monitoring** to eliminate recursive cost risk
- ✅ Document which models are free vs paid in your service
- ✅ Test fallback chain before deployment

### Cron Job Agent Selection
- ❌ Don't use expensive agents for monitoring tasks
- ✅ Use `no_agent=true` for script-only monitoring when possible
- ✅ When agent needed, restrict to free models only
- ✅ Monitor the monitor - track your monitoring costs
- ✅ Name monitoring agents distinctly (e.g., "Service Token Monitor")

### Database Edge Cases  
- ❌ Division by zero in forecasting when daily_avg = 0
- ✅ Handle missing data gracefully (< 7 days for weekly stats)
- ✅ Consider negative usage deltas (credits added mid-week)
- ✅ Validate timestamp parsing across different timezones
- ✅ Handle SQLite file locking in high-frequency monitoring

### API Response Variations
- ❌ Assume consistent API response format over time
- ✅ Parse JSON defensively with fallbacks  
- ✅ Log unexpected response structures for debugging
- ✅ Handle partial outages (API returns 200 but incomplete data)
- ✅ Implement timeout and retry logic for transient failures

## Support Files

- **templates/api_monitor_template.py** - Complete monitoring script template with LLM integration, ready for adaptation to any API service
- **references/openrouter_implementation.md** - Full OpenRouter production deployment with cron job IDs, business context, multi-LLM strategy, and Emma Token agent configuration
- **references/openrouter_emergency_2026-06-14.md** - Emergency response case study: recursive cost monitoring bug, "free" model that wasn't free, immediate stop protocol

---
*Built from OpenRouter monitoring implementation for comprehensive API cost tracking with intelligent analysis.*