What is Aggregation?
Aggregation is the process of collecting, normalizing, and unifying data from multiple prediction market platforms into a single coherent view.Think of Matchr as the Google of prediction markets - we index everything so you don’t have to.
The Data Pipeline
Our aggregation pipeline runs continuously, processing data from multiple sources:Sources We Aggregate
Polymarket
~7,000 markets tracked
- Gamma API for events & metadata
- CLOB API for orderbook & prices
- WebSocket for real-time updates
Kalshi
~3,400 markets tracked
- Elections API for event data
- Trade API for market details
- REST polling for price updates
What We Collect
For each market across all platforms, we aggregate:Event Data
| Field | Description |
|---|---|
title | Event name/question |
description | Detailed resolution criteria |
category | Politics, Sports, Crypto, etc. |
end_date | When the market resolves |
image | Event thumbnail |
Market Data
| Field | Description |
|---|---|
outcomes | YES/NO or multiple choice options |
prices | Current bid/ask for each outcome |
volume | Total trading volume |
liquidity | Available orderbook depth |
Real-time Data
| Field | Description |
|---|---|
best_bid | Highest buy price |
best_ask | Lowest sell price |
last_price | Most recent trade |
price_history | Historical price data |
Data Normalization
Different platforms structure data differently. We normalize everything:- Polymarket Format
- Kalshi Format
- Matchr Unified
Update Frequency
Market Refresh
Every 5 minutes
New markets, metadata changes
Price Updates
Real-time
WebSocket for Polymarket, 30s polling for Kalshi
Matching Engine
Continuous
New matches detected as markets appear
Embeddings & AI
Every market gets an AI embedding for semantic search and matching:- Semantic search: Find markets by meaning, not just keywords
- Market matching: Identify equivalent markets across platforms
- Similarity scoring: Rank match confidence
Database Architecture
We use a multi-layer data model:Raw Layer
Complete API responses stored as JSONB. Preserves all original data. Tables:
raw.polymarket_events, raw.kalshi_events, raw.polymarket_pricesCore Layer
Normalized, unified schema. Platform-agnostic. Tables:
core.events, core.markets, core.price_snapshotsMatch Layer
Cross-platform relationships and confidence scores. Tables:
match.event_matches, match.market_matchesPerformance
| Metric | Value |
|---|---|
| Total markets tracked | 10,000+ |
| Price update latency | Under 500ms |
| API response time | Under 200ms (p95) |
| Data freshness | Under 5 minutes |
