Skip to main content

What is Aggregation?

Aggregation is the process of collecting, normalizing, and unifying data from multiple prediction market platforms into a single coherent view.
Think of Matchr as the Google of prediction markets - we index everything so you don’t have to.

The Data Pipeline

Our aggregation pipeline runs continuously, processing data from multiple sources:
Data Pipeline

Sources We Aggregate

Polymarket

~7,000 markets tracked
  • Gamma API for events & metadata
  • CLOB API for orderbook & prices
  • WebSocket for real-time updates

Kalshi

~3,400 markets tracked
  • Elections API for event data
  • Trade API for market details
  • REST polling for price updates

What We Collect

For each market across all platforms, we aggregate:

Event Data

FieldDescription
titleEvent name/question
descriptionDetailed resolution criteria
categoryPolitics, Sports, Crypto, etc.
end_dateWhen the market resolves
imageEvent thumbnail

Market Data

FieldDescription
outcomesYES/NO or multiple choice options
pricesCurrent bid/ask for each outcome
volumeTotal trading volume
liquidityAvailable orderbook depth

Real-time Data

FieldDescription
best_bidHighest buy price
best_askLowest sell price
last_priceMost recent trade
price_historyHistorical price data

Data Normalization

Different platforms structure data differently. We normalize everything:
  • Polymarket Format
  • Kalshi Format
  • Matchr Unified
{
  "id": "123",
  "question": "Will Trump win?",
  "outcomes": ["Yes", "No"],
  "outcomePrices": ["0.52", "0.48"],
  "volume": "1234567.89",
  "enableOrderBook": true
}

Update Frequency

Market Refresh

Every 5 minutes New markets, metadata changes

Price Updates

Real-time WebSocket for Polymarket, 30s polling for Kalshi

Matching Engine

Continuous New matches detected as markets appear

Embeddings & AI

Every market gets an AI embedding for semantic search and matching:
// Example: Generate embedding for a market
const embedding = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: "Will Trump win the 2024 presidential election?"
});
// Returns: 1536-dimensional vector
These embeddings power:
  • Semantic search: Find markets by meaning, not just keywords
  • Market matching: Identify equivalent markets across platforms
  • Similarity scoring: Rank match confidence

Database Architecture

We use a multi-layer data model:

Raw Layer

Complete API responses stored as JSONB. Preserves all original data. Tables: raw.polymarket_events, raw.kalshi_events, raw.polymarket_prices

Core Layer

Normalized, unified schema. Platform-agnostic. Tables: core.events, core.markets, core.price_snapshots

Match Layer

Cross-platform relationships and confidence scores. Tables: match.event_matches, match.market_matches

Performance

MetricValue
Total markets tracked10,000+
Price update latencyUnder 500ms
API response timeUnder 200ms (p95)
Data freshnessUnder 5 minutes

Next Steps