Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.matchr.xyz/llms.txt

Use this file to discover all available pages before exploring further.

What is Aggregation?

Aggregation is the process of collecting, normalizing, and unifying data from multiple prediction market platforms into a single coherent view.
Think of Matchr as the Google of prediction markets - we index everything so you don’t have to.

The Data Pipeline

Our aggregation pipeline runs continuously, processing data from multiple sources:
Data Pipeline

Sources We Aggregate

Polymarket

~7,000 markets tracked
  • Gamma API for events & metadata
  • CLOB API for orderbook & prices
  • WebSocket for real-time updates

Kalshi

~3,400 markets tracked
  • Elections API for event data
  • Trade API for market details
  • REST polling for price updates

What We Collect

For each market across all platforms, we aggregate:

Event Data

FieldDescription
titleEvent name/question
descriptionDetailed resolution criteria
categoryPolitics, Sports, Crypto, etc.
end_dateWhen the market resolves
imageEvent thumbnail

Market Data

FieldDescription
outcomesYES/NO or multiple choice options
pricesCurrent bid/ask for each outcome
volumeTotal trading volume
liquidityAvailable orderbook depth

Real-time Data

FieldDescription
best_bidHighest buy price
best_askLowest sell price
last_priceMost recent trade
price_historyHistorical price data

Data Normalization

Different platforms structure data differently. We normalize everything:
{
  "id": "123",
  "question": "Will Trump win?",
  "outcomes": ["Yes", "No"],
  "outcomePrices": ["0.52", "0.48"],
  "volume": "1234567.89",
  "enableOrderBook": true
}

Update Frequency

Market Refresh

Every 5 minutes New markets, metadata changes

Price Updates

Real-time WebSocket for Polymarket, 30s polling for Kalshi

Matching Engine

Continuous New matches detected as markets appear

Embeddings & AI

Every market gets an AI embedding for semantic search and matching:
// Example: Generate embedding for a market
const embedding = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: "Will Trump win the 2024 presidential election?"
});
// Returns: 1536-dimensional vector
These embeddings power:
  • Semantic search: Find markets by meaning, not just keywords
  • Market matching: Identify equivalent markets across platforms
  • Similarity scoring: Rank match confidence

Database Architecture

We use a multi-layer data model:

Raw Layer

Complete API responses stored as JSONB. Preserves all original data. Tables: raw.polymarket_events, raw.kalshi_events, raw.polymarket_prices

Core Layer

Normalized, unified schema. Platform-agnostic. Tables: core.events, core.markets, core.price_snapshots

Match Layer

Cross-platform relationships and confidence scores. Tables: match.event_matches, match.market_matches

Performance

MetricValue
Total markets tracked10,000+
Price update latencyUnder 500ms
API response timeUnder 200ms (p95)
Data freshnessUnder 5 minutes

Next Steps

Matched Markets

Learn how we identify equivalent markets across platforms.

Price Discovery

Understand how prices converge across venues.