Understanding
Snowden

A technical deep-dive for developers. We walk through every piece of this autonomous prediction market trading system, from scanning Polymarket to placing Kelly-sized bets with LLM-generated probability estimates.

⏱ ~28 min read ▣ 10 Chapters </> Code Snippets

Chapter 0

What Are We Building?

Snowden is an autonomous trading system for Polymarket, the world's largest prediction market, where binary contracts trade on real-world outcomes: elections, Fed rate decisions, geopolitical events, crypto price targets. Each contract settles at $1 (YES) or $0 (NO), and trades at a price between 0 and 1 that represents the market's implied probability.

Every 15 minutes, Snowden wakes up and runs a full cycle: scan hundreds of live markets, call Claude Opus to estimate true probabilities, size positions with the Kelly criterion, enforce strict risk limits through an automated Sentinel, and execute trades on the Polymarket CLOB (central limit order book). Everything is logged to TimescaleDB and visualized in Grafana.

The Thesis

Prediction markets are efficient in aggregate but have persistent micro-inefficiencies that a systematic approach can exploit. These inefficiencies cluster around a few patterns:

Stale markets · thin liquidity, wide spreads, prices that haven't moved despite new information
Longshot bias · small-probability events consistently overpriced (people love lottery tickets)
Partisan bias · political markets where one side's money distorts the price away from polling data
News latency · prices adjust slowly to new information, especially in low-volume markets

Snowden doesn't try to beat the market on every question. It runs a funnel to find the 10–15 markets per cycle where edge is most likely, then sizes bets conservatively enough to survive the inevitable losing streaks.

The Pipeline

Scanner

→

Analyst

→

Kelly

→

Sentinel

→

Trader

all stages log to

TimescaleDB

→

Grafana

Metric	Value
Scan interval	15 minutes
Markets scanned	500+ per cycle
Funnel output	10–15 opportunities
Position sizing	Quarter-Kelly (f/4)
Risk limits	80% heat, 10% daily drawdown, 40% correlated
Starting bankroll	$2,000 USDC
LLM backbone	Claude Opus 4.6 (analyst) + Haiku 4.5 (triage)
Database	TimescaleDB (PostgreSQL 16 + hypertables)

Chapter 1

The Foundation

Configuration

All configuration flows through a single Pydantic Settings class that reads from environment variables and an optional .env file. No YAML, no TOML, no config hierarchy to debug. Docker Compose sets the infra variables; everything else has sensible defaults.

The module-level settings = Settings() is the only global state in the system. Every other module imports it by name.

snowden/config.py

class Settings(BaseSettings):
    model_config = {"env_prefix": "", "env_file": ".env", "extra": "ignore"}

    # Database
    tsdb_host: str = "localhost"
    tsdb_port: int = 5432
    tsdb_db:   str = "snowden"

    # Polymarket API credentials
    poly_api_key:    str = ""
    poly_api_secret: str = ""
    poly_private_key: str = ""

    # Risk parameters
    max_heat:            float = 0.80   # 80% of equity deployed
    max_single_position: float = 0.25   # 25% max per market
    max_daily_drawdown:  float = 0.10   # 10% triggers kill switch
    max_correlated:      float = 0.40   # 40% per category
    kelly_divisor:       float = 4.0    # quarter-Kelly
    edge_threshold:      float = 0.05   # 5% minimum edge to trade

    @property
    def tsdb_dsn(self) -> str:
        return f"postgresql://{self.tsdb_user}:{self.tsdb_password}@{self.tsdb_host}:{self.tsdb_port}/{self.tsdb_db}"

    @property
    def is_paper(self) -> bool:
        return self.mode == "paper"

settings = Settings()   # module-level singleton

Backend Dev Note

Pydantic Settings gives us runtime type validation for free. If someone sets TSDB_PORT=banana, the app fails immediately at startup with a clear validation error, not deep in a database connection two hours later. The "extra": "ignore" flag means unrecognized env vars are silently dropped, so you can share a .env across services without conflicts.

The Type System

Every data shape in the system lives in a single file: types.py. Pydantic models for structured data, StrEnum for categorical values, and Protocol interfaces for swappable backends. This file is the contract between every module. If you change a type here, the type checker catches every downstream breakage.

snowden/types.py

class Regime(StrEnum):
    """Market regime drives strategy selection."""
    CONSENSUS   = "consensus"        # market agrees, little edge
    CONTESTED   = "contested"        # genuine disagreement
    CATALYST    = "catalyst_pending" # event upcoming that will move price
    RESOLVING   = "resolution_imminent"
    STALE       = "stale"            # no one is paying attention
    NEWS_DRIVEN = "news_driven"      # recent news shifted reality

class Strategy(StrEnum):
    THETA          = "theta_harvest"    # near-certain outcomes
    LONGSHOT_FADE  = "longshot_fade"    # overpriced tails
    NEWS_LATENCY   = "news_latency"     # slow price adjustment
    PARTISAN_FADE  = "partisan_fade"    # political bias
    CORRELATED_ARB = "correlated_arb"   # linked market mispricing
    STALE_REPRICE  = "stale_reprice"    # abandoned markets

@runtime_checkable
class MarketClient(Protocol):
    """Swappable backend: LiveClient for real trading, SimClient for paper."""
    async def get_active_markets(self) -> pl.DataFrame: ...
    async def get_book(self, token_id: str) -> dict: ...
    async def get_midpoint(self, token_id: str) -> float: ...
    async def execute(self, signal: TradeSignal) -> OrderResult: ...

Design Note

The Protocol interfaces are the key abstraction. LiveClient and SimClient both satisfy MarketClient through structural typing: no inheritance, no registration, no abstract base class. If a class has the right methods with the right signatures, it's a valid MarketClient. This makes testing trivial: any object with the right shape works.

Chapter 2

The Scanner

The scanner is a 5-stage funnel that reduces 500+ active Polymarket markets down to 10–15 tradeable opportunities. Stages 1–4 are pure data operations (Polars and Python). Stage 5 is a single, cheap Haiku call to filter out noise before expensive Opus analysis.

The design principle: each stage is cheap enough to run on every market, and aggressive enough to cut the dataset significantly. By the time we reach the LLM, we've already eliminated 95% of markets through deterministic rules.

Stage 1: Fetch

Paginated fetch of all active markets from Polymarket's Gamma API. Events (which can contain multiple binary markets) are flattened into individual rows. Token IDs, outcome prices, volumes, and resolution dates are normalized into a Polars DataFrame.

This is the only network-heavy stage. A typical fetch takes 1–2 seconds and returns 500–800 markets depending on platform activity.

Stage 2: Liquidity Gate

A pure Polars filter with four hard requirements. Any market that fails any condition is dropped immediately.

snowden/scanner.py

def stage_2_liquidity_gate(df: pl.DataFrame) -> pl.DataFrame:
    """Filter markets by minimum liquidity requirements."""
    return df.filter(
        (pl.col("vol_24h") >= settings.min_liquidity_usd)           # >= $5,000
        & ((pl.col("bid_depth") + pl.col("ask_depth"))
           >= settings.min_book_depth_usd)                           # >= $500 total
        & (pl.col("spread") <= settings.max_spread)                 # <= 8%
        & (pl.col("hours_to_resolve") >= settings.min_hours_to_resolve)   # >= 24 hours
        & (pl.col("hours_to_resolve") <= settings.max_days_to_resolve * 24) # <= 180 days
    )

Prediction Market Note

Why filter on resolution time? Markets resolving within 24 hours are usually efficiently priced: too many eyeballs, too little time for the price to drift. Markets beyond 180 days have too much uncertainty for our edge to compound meaningfully, and capital is tied up too long. The sweet spot is 1–4 weeks: enough time for information asymmetry to exist, short enough that your capital turns over.

Stage 3: Efficiency Score

A composite score estimating how "beatable" each market is. Each component contributes to a score between 0 and 1, where lower means less efficient (i.e., more likely to have exploitable mispricing). Markets scoring above 0.4 are dropped.

Component	Weight	Logic
Spread	25%	Wider spread = less liquidity = less efficient
Volume	20%	Lower 24h volume = fewer participants = less efficient
Book depth	15%	Shallow order book = easier to move the price
Price extremity	15%	Prices near 0 or 1 have known tail biases
Time window	25%	1–4 week resolution = ideal sweet spot for mispricing

Stage 4: Strategy Match

Each surviving market is classified into one or more strategy buckets based on its price, volume, category, and spread characteristics. Markets that don't match any known pattern are dropped. Priority scoring combines estimated edge, confidence modifier, liquidity, and time decay into a single sortable number.

Strategy	Trigger Condition	Edge Estimate	Why It Works
Theta Harvest	mid ≥ 0.88 or ≤ 0.12	distance to boundary	Near-certain outcomes trade at a discount to certainty
Longshot Fade	mid ≤ 0.08 or ≥ 0.92	tail × 0.5	People overpay for lottery-ticket outcomes
Stale Reprice	Low vol + wide spread	~4%	No one is watching; reality has moved on
Partisan Fade	Political + mid 0.25–0.75	~6%	Partisan money pushes prices away from polling data

Stage 5: Haiku Triage

One batch call to Claude Haiku with all remaining candidates. The model reads each market's question, mid price, matched strategies, and volume, then picks 10–15 worth deep analysis. This is a cheap filter (one Haiku call ~ $0.002) before expensive Opus calls (each ~ $0.05).

snowden/scanner.py

async def stage_5_haiku_triage(
    candidates: list[ScanResult],
    anthropic_client: anthropic.AsyncAnthropic,
) -> list[ScanResult]:
    """Haiku pre-screen: is this worth deep Analyst analysis?"""
    batch_text = "\n".join(
        f"[{i}] Q: {c.market.question} | Mid: {c.market.mid:.2f} | "
        f"Strategies: {', '.join(s.value for s in c.matched_strategies)} | "
        f"Vol24h: ${c.market.vol_24h:,.0f}"
        for i, c in enumerate(candidates)
    )

    response = await anthropic_client.messages.create(
        model=settings.triage_model,   # claude-haiku-4-5
        max_tokens=500,
        system="Select 10-15 markets worth deep analysis. Skip obviously "
               "efficient markets. Respond with comma-separated indices only.",
        messages=[{"role": "user", "content": batch_text}],
    )
    indices = [int(x.strip()) for x in response.content[0].text.split(",")
               if x.strip().isdigit()]
    return [candidates[i] for i in indices if i < len(candidates)]

The Funnel in Numbers

Stage 1: Raw fetch

500+

Stage 2: Liquidity gate

~100

Stage 3: Efficiency filter

~50

Stage 4: Strategy match

~30

Stage 5: Haiku triage

10–15

Chapter 3

The Analyst

The Analyst is the most expensive and most important component: a Claude Opus call for each candidate market, asking the model to estimate the true probability of the event occurring, independent of what the market currently prices. The prompt engineering is the core intellectual property of the system.

The System Prompt

The system prompt is designed to counteract known LLM failure modes in probability estimation. Without explicit guidance, language models anchor on the market price, fall for narrative bias, exhibit recency bias, and cluster their estimates at round numbers (50%, 75%, 90%). The prompt addresses each of these directly.

snowden/agents/analyst.py, system prompt (abbreviated)

ANALYST_SYSTEM_PROMPT = (
    "You are a professional prediction market analyst. Estimate the TRUE "
    "probability of an event, independent of the market price.\n\n"

    "CALIBRATION RULES:\n"
    "1. When you say 70%, the event should happen ~70% of the time.\n"
    "2. Base estimates on EVIDENCE, not narrative. Weight hard data\n"
    "   (polls, filings, schedules) over soft signals (sentiment, vibes).\n"
    "3. Political markets: weight polling aggregates over pundit takes.\n"
    "4. Distinguish 'I don't know' (low confidence, near market price)\n"
    "   from 'the market is wrong' (high confidence, far from market).\n"
    "5. Biases to AVOID:\n"
    "   - Anchoring on the current market price\n"
    "   - Narrative bias (good story != high probability)\n"
    "   - Recency bias (last week's news != permanent shift)\n"
    "   - Round number bias (don't cluster at 50%, 75%, 90%)\n"
    "6. If evidence is thin, say so. Set confidence LOW.\n\n"

    "IMPORTANT: p_est_raw is YOUR raw estimate BEFORE calibration.\n"
    "The system applies Platt scaling separately. Give your honest best."
)

Analysis Flow

For each candidate market, the Analyst executes a four-step pipeline:

1Fetch news via category-specific RSS feeds (politics, crypto, finance, sports, legal). Feeds are parsed with feedparser, deduplicated by title prefix, sorted by recency, and capped at 15 items. General news feeds are always included alongside the category-specific ones.
2Build prompt with market data (mid, bid/ask, spread, volume, open interest), 7-day price history, matched strategies from Stage 4, resolution source, and formatted news context. The description is capped at 500 characters to stay focused.
3Call Claude Opus. The model returns a JSON object with its raw probability estimate (p_est_raw), confidence level, regime classification, reasoning (capped at 3 sentences), and a strategy hint.
4Apply Platt scaling via calibrator.correct(raw_est) to transform the raw LLM probability into a calibrated estimate. The calibrated value becomes p_est and is used for all downstream decisions.

snowden/agents/analyst.py

async def analyze_market(
    scan: ScanResult, calibrator: Calibrator,
    client: anthropic.AsyncAnthropic | None = None,
) -> EventAnalysis | None:

    # Step 1: Fetch fresh news for this market's category
    news_items = await fetch_news_for_market(
        scan.market.question, scan.market.category.value,
    )
    scan.news_headlines = [item.title for item in news_items]

    # Step 2: Build the analyst prompt
    prompt = build_analyst_prompt(scan)

    # Step 3: Call Opus, parse JSON response
    response = await client.messages.create(
        model=settings.analyst_model,  # claude-opus-4-6
        max_tokens=settings.analyst_max_tokens,
        system=ANALYST_SYSTEM_PROMPT,
        messages=[{"role": "user", "content": prompt}],
    )
    data = json.loads(response.content[0].text.strip()
                      .replace("```json", "").replace("```", ""))

    # Step 4: Apply calibration correction
    raw_est = float(data["p_est_raw"])
    calibrated = calibrator.correct(raw_est)

    return EventAnalysis(
        market_id=data["market_id"], question=data["question"],
        p_market=float(data["p_market"]),
        p_est=calibrated, p_est_raw=raw_est,
        confidence=float(data["confidence"]),
        regime=Regime(data["regime"]),
        edge=round(calibrated - float(data["p_market"]), 4),
        reasoning=data["reasoning"],
        key_factors=data.get("key_factors", []),
        data_quality=float(data.get("data_quality", 0.5)),
        strategy_hint=Strategy(data["strategy_hint"]) if data.get("strategy_hint") else None,
    )

Prediction Market Note

The Analyst outputs p_est_raw, its honest best estimate before any correction. The system then applies Platt scaling (logistic regression on historical log-odds vs outcomes) to fix systematic bias. This separation is critical: the LLM gives its best guess, statistics fix the systematic errors. Over time, the calibrator learns whether Claude tends to be overconfident in the 60–80% range, or underconfident near the tails, and corrects for it automatically.

Chapter 4

Kelly Criterion

The Kelly criterion determines optimal bet sizing for repeated wagers with a known edge. In prediction markets, each trade is a binary bet: the contract settles at $1 (YES) or $0 (NO). If our estimated probability differs from the market price, we have edge, and Kelly tells us exactly how much of our bankroll to wager.

The Math

For buying YES at market price p_market with estimated true probability p_est, the implied decimal odds are b = (1/p_market) - 1, and the Kelly fraction is f = (p_est × b - (1 - p_est)) / b. For buying NO, we mirror the probabilities. The fraction is then divided by kelly_divisor (default 4) and clamped to max_single_position (25%).

snowden/kelly.py

def kelly_fraction(
    p_est: float, p_market: float,
    divisor: float | None = None,
    max_frac: float | None = None,
) -> float | None:
    """Returns None if no edge or negative Kelly."""
    divisor = divisor or settings.kelly_divisor     # 4.0
    max_frac = max_frac or settings.max_single_position # 0.25

    # Minimum edge threshold to avoid noise trades
    if abs(p_est - p_market) < settings.kelly_edge_threshold: # 3%
        return None

    if p_est > p_market:
        b = (1.0 / p_market) - 1.0            # decimal odds
        f = (p_est * b - (1 - p_est)) / b   # Kelly fraction
    else:
        # Buying NO: mirror the probabilities
        p_no_market = 1.0 - p_market
        p_no_est = 1.0 - p_est
        b = (1.0 / p_no_market) - 1.0
        f = (p_no_est * b - p_est) / b

    if f <= 0:
        return None

    return float(np.clip(f / divisor, 0.0, max_frac))

Prediction Market Note

Why quarter-Kelly? Full Kelly maximizes the long-run compound growth rate but has enormous variance. A bad streak can draw down 50%+ before recovering. The growth rate scales as f × (2 - f) relative to full Kelly, so quarter-Kelly captures ~44% of the growth rate while reducing variance by 75%. For a $2,000 experiment where survival matters more than speed, this is the right trade-off.

Signal Building

The build_signal function ties estimation to execution. It computes a confidence-weighted edge (|p_est - p_market| × confidence), checks whether it exceeds the 5% minimum, determines direction (YES or NO), computes the dollar size, and adds a 1-cent slippage buffer to the limit price.

snowden/kelly.py

def build_signal(market_id, yes_token, no_token,
                  p_est, p_market, confidence, bankroll, strategy):
    # Confidence-weighted edge must exceed 5%
    edge = abs(p_est - p_market) * confidence
    if edge < settings.edge_threshold:
        return None

    going_yes = p_est > p_market
    size = compute_size(effective_p_est, effective_p_market, bankroll)
    if size is None or size < settings.min_trade_usd:  # $5 minimum
        return None

    return TradeSignal(
        direction="YES" if going_yes else "NO",
        size_usd=round(size, 2),
        limit_price=effective_p_market + settings.slippage_buffer,  # +$0.01
        kelly_frac=size / bankroll,
        edge=round(p_est - p_market, 4),
        ...
    )

Chapter 5

Risk Management

The Sentinel is pure math. No LLM reasoning, no ambiguity, no exceptions. Four sequential checks, each with a hard limit. If any check fails, the trade is vetoed. If daily drawdown exceeds 10%, the entire system freezes.

snowden/agents/sentinel.py

def check_signal(
    signal: TradeSignal, portfolio: PortfolioState,
) -> RiskCheck:

    # CHECK 1: Single position size
    single_exposure = signal.size_usd / portfolio.total_equity
    if single_exposure > settings.max_single_position:  # > 25%
        return RiskCheck(approved=False, reason="Single position too large", ...)

    # CHECK 2: Portfolio heat (total capital deployed)
    new_heat = (portfolio.heat * portfolio.total_equity + signal.size_usd) \
               / portfolio.total_equity
    if new_heat > settings.max_heat:  # > 80%
        return RiskCheck(approved=False, reason="Heat limit exceeded", ...)

    # CHECK 3: Daily drawdown (kills switch if > 10%)
    if portfolio.daily_drawdown > settings.max_daily_drawdown:
        return RiskCheck(approved=False, reason="FROZEN: drawdown limit", ...)

    # CHECK 4: Correlated exposure (same category)
    correlated_usd = sum(
        p.size_usd for p in portfolio.positions
        if p.category.value == signal.category.value
    )
    if (correlated_usd + signal.size_usd) / portfolio.total_equity
            > settings.max_correlated:  # > 40%
        return RiskCheck(approved=False, reason="Correlated limit", ...)

    # All checks passed
    return RiskCheck(approved=True, heat=new_heat, ...)

Check	Limit	Rationale	On Breach
Single position	< 25% of equity	No one bet should be existential	Veto signal
Portfolio heat	< 80%	Always keep 20% cash for opportunities	Veto signal
Daily drawdown	< 10%	Losing $200 in a day means something is wrong	FREEZE all trading
Correlated exposure	< 40% per category	Don't be all-in on "politics" or "crypto"	Veto signal

Design Note

The Sentinel is deliberately simple. Complex risk models (Value-at-Risk, Monte Carlo, copulas) give a false sense of precision. Four hard limits, checked in sequence, with a kill switch. If any limit is hit, trading stops. No exceptions, no "just this once," no manual override. The best risk management is the kind that can't be argued with.

Chapter 6

Execution

The Trader is pure execution, with no analysis and no opinion. It receives a TradeSignal that has already been approved by the Sentinel, checks the order book for adverse price movement, and places a limit order.

snowden/agents/trader.py

async def execute_signal(
    signal: TradeSignal,
    client: LiveClient | SimClient,
    store: Store,
) -> OrderResult:
    # Pre-execution: check order book hasn't moved against us
    book = await client.get_book(signal.token_id)
    best_ask = float(book["asks"][0]["price"]) if book.get("asks") else 1.0

    # Slippage guard: abort if book moved > 3% against us
    if signal.direction == "YES" and best_ask > signal.p_market * 1.03:
        return OrderResult(status="CANCELLED", ts=datetime.now(UTC))

    # Execute through client (live order or paper fill)
    result = await client.execute(signal)

    # Log every execution to TimescaleDB
    await store.log_trade(signal, result, paper=result.status == "PAPER")
    return result

Paper vs Live

The system ships with two clients that both satisfy the MarketClient protocol. SimClient delegates all reads to the real Polymarket API (so you scan real markets at real prices) but simulates writes: execute() returns an instant "PAPER" fill at the current midpoint with zero slippage. LiveClient uses py-clob-client to sign and submit real limit orders to the Polygon-based CLOB.

Backend Dev Note

Paper mode isn't a separate codepath. It's the same pipeline with a different client injected at startup. The Chief doesn't know or care whether it's paper or live. This means every bug you find in paper mode is a bug you've fixed before going live.

Chapter 7

Calibration

The calibration engine answers the question: when Claude says 70%, does the event actually happen 70% of the time? If there's systematic bias (e.g., Claude is overconfident in the 60–80% range), Platt scaling corrects it. If there's no bias, the calibrator passes through the raw estimate unchanged.

Platt Scaling

The technique is simple: take all resolved predictions, convert raw probabilities to log-odds, and fit a logistic regression against actual outcomes. The fitted model then maps future raw probabilities to calibrated ones. The calibrator needs at least 50 resolved predictions before it activates; before that, raw estimates pass through unchanged.

snowden/calibrate.py

class Calibrator:
    def __init__(self):
        self._scaler = LogisticRegression(C=1.0, solver="lbfgs")
        self._fitted = False

    async def fit_from_db(self, store: Store, min_samples=50) -> bool:
        resolved = await store.get_resolved_predictions()
        if len(resolved) < min_samples:
            return False  # Not enough data yet

        preds = resolved["p_est_raw"].to_numpy().astype(np.float64)
        actuals = resolved["outcome"].to_numpy().astype(np.int32)

        # Clip to avoid log(0), convert to log-odds
        preds = np.clip(preds, 0.001, 0.999)
        logits = np.log(preds / (1.0 - preds)).reshape(-1, 1)

        self._scaler.fit(logits, actuals)
        self._fitted = True
        return True

    def correct(self, raw_prob: float) -> float:
        """Apply Platt scaling. Pass-through if not yet fitted."""
        if not self._fitted:
            return raw_prob
        raw_prob = float(np.clip(raw_prob, 0.001, 0.999))
        logit = np.log(raw_prob / (1.0 - raw_prob))
        return float(self._scaler.predict_proba([[logit]])[0][1])

Prediction Market Note

Brier score is the gold standard for measuring probability calibration: the mean squared error between predicted probabilities and actual binary outcomes. A score of 0 = perfect calibration. 0.25 = coin-flip performance. 1.0 = consistently wrong. For reference, Tetlock's superforecasters achieve ~0.15 on geopolitical questions. If Snowden stays below 0.20, the Kelly criterion will generate positive expected value over time.

Reliability Report

The generate_report() method produces a full calibration diagnostic: Brier score, decile reliability buckets (predicted probability vs actual outcome rate), and over/under-confidence bias detection. Predictions above 50% where the actual rate is lower indicate overconfidence; predictions below 50% where the actual rate is higher indicate underconfidence. Both are common LLM failure modes.

Resolution Backfill

A dedicated script (scripts/resolve.py) polls Polymarket for resolved markets and backfills outcomes into the predictions table. This is how the calibrator accumulates training data. As more predictions resolve, the Platt scaling becomes more accurate, creating a positive feedback loop.

Chapter 8

Infrastructure

TimescaleDB Schema

Six tables, all configured as hypertables (TimescaleDB's time-partitioned tables that automatically shard by timestamp). Every stage of the pipeline writes structured data. The schema is designed so that every decision is auditable: you can trace any trade back to the prediction, the scan result, and the market tick that triggered it.

Table	Purpose	Write Frequency	Key Columns
`market_ticks`	Price snapshots	Per scan per market	mid, spread, vol_24h, depths
`predictions`	Every analyst estimate	10–15 per cycle	p_est, p_est_raw, confidence, regime, resolved, outcome
`trades`	Paper + live orders	0–5 per cycle	size, price, status, kelly_frac, strategy
`portfolio_snapshots`	Portfolio state	1 per cycle	bankroll, heat, daily_pnl, drawdown
`scanner_metrics`	Funnel numbers	1 per cycle	stage_1..5 counts, duration_ms
`market_metadata`	Cached market info	On first scan	question, category, token IDs

infra/init.sql

CREATE TABLE predictions (
    ts           TIMESTAMPTZ NOT NULL,
    market_id    TEXT        NOT NULL,
    question     TEXT,
    p_market     FLOAT8,
    p_est        FLOAT8,         -- calibrated estimate
    p_est_raw    FLOAT8,         -- raw LLM output
    confidence   FLOAT8,
    regime       TEXT,
    strategy     TEXT,
    edge         FLOAT8,
    reasoning    TEXT,
    data_quality FLOAT8      DEFAULT 0.5,
    resolved     BOOLEAN     DEFAULT FALSE,
    outcome      SMALLINT        -- 1 = YES, 0 = NO
);
SELECT create_hypertable('predictions', 'ts');
CREATE INDEX idx_pred_market ON predictions (market_id, ts DESC);
CREATE INDEX idx_pred_resolved ON predictions (resolved) WHERE resolved = true;

Continuous Aggregates

TimescaleDB's continuous aggregates are materialized views that incrementally refresh as new data arrives. Three views power the Grafana dashboards without requiring any manual rollup logic:

infra/continuous_aggs.sql

-- Live Brier score, average edge, and confidence over time
CREATE MATERIALIZED VIEW prediction_accuracy_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', ts) AS bucket,
    COUNT(*) AS n_predictions,
    COUNT(*) FILTER (WHERE resolved) AS n_resolved,
    AVG(CASE WHEN resolved THEN (p_est - outcome)^2 END) AS brier_score,
    AVG(edge) AS avg_edge,
    AVG(confidence) AS avg_confidence
FROM predictions GROUP BY bucket;

Docker Compose

Two services: TimescaleDB (PostgreSQL 16 with time-series extensions) and Grafana for visualization. The SQL init scripts are mounted into the container's initdb.d directory and execute automatically on first boot. A health check ensures the database is ready before Grafana connects.

The Gymnasium Environment

SnowdenReplayEnv replays historical predictions through a standard Gymnasium interface for parameter sweeps. It's not RL training; it's a backtesting harness for finding the optimal Kelly divisor (2, 4, 6, or 8) and bet threshold (5%, 10%, or 20% of bankroll) against resolved outcomes.

snowden/env.py

class SnowdenReplayEnv(gym.Env):
    def __init__(self, predictions: pl.DataFrame, initial_bankroll=2000.0):
        # Obs: [p_est, p_market, edge, confidence, spread, days_to_resolve]
        self.observation_space = spaces.Box(-1, 365, shape=(6,), dtype=np.float32)
        # Action: [skip, small 5%, medium 10%, large 20%]
        self.action_space = spaces.Discrete(4)

    def step(self, action):
        size_map = {0: 0.0, 1: 0.05, 2: 0.10, 3: 0.20}
        bet = size_map[action] * self._bankroll
        # PnL computed from resolved outcome + market price
        self._bankroll += pnl
        self._peak = max(self._peak, self._bankroll)
        return obs, pnl, done, False, {"bankroll": ..., "drawdown": ...}

Chapter 9

The Full Journey

Every 15 minutes, the Chief orchestrator wakes up and runs a complete trading cycle. Here is the journey of a single cycle, from wake-up to portfolio snapshot.

1Chief wakes up. Reloads portfolio state from TimescaleDB: open positions, current bankroll, daily P&L. Checks the kill switch: if daily drawdown exceeds 10%, the cycle is aborted and all trading is frozen.
2Scanner runs stages 1–5. Fetches 500+ markets, filters through liquidity gate, efficiency scoring, strategy matching, and Haiku triage. ~2–3 seconds total. Funnel metrics (stage counts and duration) are logged to scanner_metrics.
3Price history enrichment. For each approved candidate, the system fetches 7-day price history from the CLOB timeseries API. This gives the Analyst context on recent price movement and volatility.
4Analyst calls Claude Opus for each candidate. Calls are sequential to respect rate limits. Each call fetches category-specific news, builds a detailed prompt, and returns a calibrated probability estimate. ~30 seconds for 10–15 markets.
5Kelly sizes each signal. For each analysis with sufficient confidence, build_signal() computes the confidence-weighted edge, checks the 5% threshold, and calculates the quarter-Kelly position size in USD.
6Sentinel checks risk limits. Each signal passes through four sequential checks: single position size, portfolio heat, daily drawdown, and correlated category exposure. Any failure vetoes the signal.
7Trader executes approved trades. Fetches the order book, runs the slippage guard (abort if price moved >3%), and places a limit order through the live or paper client. Every execution is logged.
8Portfolio snapshot. The Chief computes total equity (cash + mark-to-market position value), updates heat and P&L, and writes a portfolio_snapshots row. Grafana picks it up in real time.

The Orchestrator

snowden/agents/chief.py, run_cycle()

async def run_cycle(self, cycle_number: int) -> None:
    self._portfolio.cycle_number = cycle_number

    # Reload positions from DB
    positions_df = await self._store.get_active_positions()
    self._portfolio.positions = self._build_positions(positions_df)

    # Kill switch check
    if check_kill_switch(self._portfolio):
        log.critical("FROZEN", reason="kill_switch_active")
        return

    # Scan stages 1-5
    approved, stage_counts, scan_ms = await self._scan()
    await self._store.log_scan_metrics(stage_counts, scan_ms)

    if not approved:
        log.info("no_opportunities"); return

    # Enrich with price history, then analyze with Opus
    analyses = await analyze_batch(approved, self._calibrator)

    # Build signals → Sentinel risk check → Trader execution
    for analysis in analyses:
        if analysis.confidence < settings.min_confidence: continue
        signal = build_signal(
            market_id=analysis.market_id,
            p_est=analysis.p_est, p_market=analysis.p_market,
            confidence=analysis.confidence,
            bankroll=self._portfolio.bankroll,
            ...
        )
        if signal is None: continue

        risk = check_signal(signal, self._portfolio)
        if not risk.approved: continue

        result = await execute_signal(signal, self._client, self._store)
        if result.status in ("FILLED", "PAPER"):
            self._portfolio.bankroll -= signal.size_usd
            self._portfolio.heat = risk.heat

    # Snapshot portfolio state to TimescaleDB
    await self._store.log_portfolio_snapshot(self._portfolio)
    log.info("cycle_complete", cycle=cycle_number,
             bankroll=round(self._portfolio.bankroll, 2))

Final Note

The system is designed for one thing: disciplined, systematic edge extraction from prediction markets. No heroics, no overrides, no FOMO. Scan, analyze, size, check, execute, log. Every 15 minutes. The thesis isn't that the LLM is always right. It's that over hundreds of bets, a calibrated probability estimator with conservative sizing and strict risk limits has positive expected value. Let the math do the work.

UnderstandingSnowden

What Are We Building?

The Thesis

The Pipeline

The Foundation

Configuration

The Type System

The Scanner

Stage 1: Fetch

Stage 2: Liquidity Gate

Stage 3: Efficiency Score

Stage 4: Strategy Match

Stage 5: Haiku Triage

The Funnel in Numbers

The Analyst

The System Prompt

Analysis Flow

Kelly Criterion

The Math

Signal Building

Risk Management

Execution

Paper vs Live

Calibration

Platt Scaling

Reliability Report

Resolution Backfill

Infrastructure

TimescaleDB Schema

Continuous Aggregates

Docker Compose

The Gymnasium Environment

The Full Journey

The Orchestrator

Understanding
Snowden