Aggregation

What is Aggregation?

Aggregation is the process of collecting, normalizing, and unifying data from multiple prediction market platforms into a single coherent view.

Think of Matchr as the Google of prediction markets - we index everything so you don’t have to.

The Data Pipeline

Our aggregation pipeline runs continuously, processing data from multiple sources:

Sources We Aggregate

Polymarket

~7,000 markets tracked

Gamma API for events & metadata
CLOB API for orderbook & prices
WebSocket for real-time updates

Kalshi

~3,400 markets tracked

Elections API for event data
Trade API for market details
REST polling for price updates

What We Collect

For each market across all platforms, we aggregate:

Event Data

Field	Description
`title`	Event name/question
`description`	Detailed resolution criteria
`category`	Politics, Sports, Crypto, etc.
`end_date`	When the market resolves
`image`	Event thumbnail

Market Data

Field	Description
`outcomes`	YES/NO or multiple choice options
`prices`	Current bid/ask for each outcome
`volume`	Total trading volume
`liquidity`	Available orderbook depth

Real-time Data

Field	Description
`best_bid`	Highest buy price
`best_ask`	Lowest sell price
`last_price`	Most recent trade
`price_history`	Historical price data

Data Normalization

Different platforms structure data differently. We normalize everything:

Polymarket Format
Kalshi Format
Matchr Unified

{
  "id": "123",
  "question": "Will Trump win?",
  "outcomes": ["Yes", "No"],
  "outcomePrices": ["0.52", "0.48"],
  "volume": "1234567.89",
  "enableOrderBook": true
}

{
  "ticker": "PRES-24-DT",
  "title": "Donald Trump wins",
  "yes_bid": 52,
  "yes_ask": 53,
  "volume": 123456,
  "status": "open"
}

{
  "id": "matchr_123",
  "platform": "polymarket",
  "question": "Will Trump win?",
  "outcomes": [
    { "name": "Yes", "price": 0.52 },
    { "name": "No", "price": 0.48 }
  ],
  "volume": 1234567.89,
  "matched_markets": ["kalshi_PRES-24-DT"]
}

Update Frequency

Market Refresh

Every 5 minutes New markets, metadata changes

Price Updates

Real-time WebSocket for Polymarket, 30s polling for Kalshi

Matching Engine

Continuous New matches detected as markets appear

Embeddings & AI

Every market gets an AI embedding for semantic search and matching:

// Example: Generate embedding for a market
const embedding = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: "Will Trump win the 2024 presidential election?"
});
// Returns: 1536-dimensional vector

These embeddings power:

Semantic search: Find markets by meaning, not just keywords
Market matching: Identify equivalent markets across platforms
Similarity scoring: Rank match confidence

Database Architecture

We use a multi-layer data model:

Raw Layer

Complete API responses stored as JSONB. Preserves all original data. Tables: raw.polymarket_events, raw.kalshi_events, raw.polymarket_prices

Core Layer

Normalized, unified schema. Platform-agnostic. Tables: core.events, core.markets, core.price_snapshots

Match Layer

Cross-platform relationships and confidence scores. Tables: match.event_matches, match.market_matches

Performance

Metric	Value
Total markets tracked	10,000+
Price update latency	Under 500ms
API response time	Under 200ms (p95)
Data freshness	Under 5 minutes

Getting Started

Core Concepts

Features

Vaults

Resources

What is Aggregation?

The Data Pipeline

Sources We Aggregate

Polymarket

Kalshi

What We Collect

Event Data

Market Data

Real-time Data

Data Normalization

Update Frequency

Market Refresh

Price Updates

Matching Engine

Embeddings & AI

Database Architecture

Performance

Next Steps

Matched Markets

Price Discovery

Getting Started

Core Concepts

Features

Vaults

Resources

​What is Aggregation?

​The Data Pipeline

​Sources We Aggregate

Polymarket

Kalshi

​What We Collect

​Event Data

​Market Data

​Real-time Data

​Data Normalization

​Update Frequency

Market Refresh

Price Updates

Matching Engine

​Embeddings & AI

​Database Architecture

​Performance

​Next Steps

Matched Markets

Price Discovery

What is Aggregation?

The Data Pipeline

Sources We Aggregate

What We Collect

Event Data

Market Data

Real-time Data

Data Normalization

Update Frequency

Embeddings & AI

Database Architecture

Performance

Next Steps