Skip to main content

The Matching Challenge

The same real-world event appears on different platforms with different names:
PlatformMarket Title
Polymarket”Will Trump win the 2024 election?”
Kalshi”Winner of 2024 Presidential Election: Trump”
PredictIt”Donald Trump wins presidency 2024”
Same event. Different phrasing. Different prices. Simple string matching doesn’t work. We need AI.

How Matching Works

Our matching engine uses a three-stage process:

Embedding Generation

Every market gets a 1536-dimensional vector embedding using OpenAI’s text-embedding-3-small model.
const embedding = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: market.title + " " + market.description
});

Similarity Search

We use pgvector with HNSW indexing for fast similarity search across 10,000+ markets.
SELECT *
FROM polymarket_events p, kalshi_events k
WHERE 1 - (p.embedding <=> k.embedding) > 0.70
ORDER BY similarity DESC;

Smart Filtering

High-similarity matches go through additional validation:
  • Temporal alignment: Do end dates match within 7 days?
  • Outcome mapping: Can YES/NO outcomes be paired?
  • Entity matching: Are key entities (people, companies) the same?

Confidence Tiers

Matches are classified by confidence score:
  • High Confidence (≥85%)
  • Medium Confidence (70-84%)
  • Low Confidence (<70%)
Auto-confirmed — These matches go live immediately.
Characteristics:
  • Near-identical phrasing
  • Same resolution criteria
  • Matching time boundaries
Example:
Polymarket: "Will Bitcoin reach $100k in 2024?"
Kalshi:     "Bitcoin at or above $100k by Dec 31, 2024"
Score:      0.92 ✓

Match Signals

Beyond embedding similarity, we extract additional signals:
interface MatchSignals {
  similarity_score: number;      // 0.0 - 1.0
  temporal_overlap: boolean;     // End dates within 7 days?
  category_match: boolean;       // Same category?
  entity_match: boolean;         // Key entities align?
  volume_ratio: number;          // Relative trading activity
  price_spread: number;          // Current price difference
}

Example Match Analysis

{
  "polymarket_title": "Trump wins 2024 election?",
  "kalshi_title": "Donald Trump wins 2024 Presidential Election",
  "similarity_score": 0.94,
  "signals": {
    "temporal_overlap": true,
    "category_match": true,
    "entity_match": true,
    "volume_ratio": 2.3,
    "price_spread": 0.028
  },
  "confidence_tier": "high",
  "status": "auto_confirmed"
}

Current Statistics

900+

Matched Pairs

95%

Accuracy (High Conf)

3-7%

Average Spread

<5%

False Positive Rate

Common Match Categories

CategoryMatched PairsAvg Spread
Politics3122.8%
Sports2454.1%
Crypto1783.5%
Economics1342.2%
Entertainment875.6%

Handling Edge Cases

Time Period Mismatches

Polymarket: "Bitcoin hits $100k in 2024"
Kalshi:     "Bitcoin hits $100k in 2025"

→ NOT matched (different time periods)

Opposite Meanings

Polymarket: "Trump wins election"
Kalshi:     "Trump loses election"

→ NOT matched (opposite outcomes, handled separately)

Ambiguous Resolution

Polymarket: "Fed cuts rates before July"
Kalshi:     "Fed cuts rates in Q2"

→ FLAGGED for review (overlapping but not identical)

Using Matched Markets

In the UI

Look for the green “Matched” badge on market cards:
Matched Market UI
Click to expand and see:
  • Prices on each platform
  • Current spread
  • Historical convergence

Via API

// Get all matched markets with >3% spread
const matches = await matchr.getMatchedMarkets({
  minSpread: 0.03,
  status: 'confirmed'
});

// Returns
[
  {
    polymarket: { id: '123', price: 0.52 },
    kalshi: { id: 'PRES-24-DT', price: 0.55 },
    spread: 0.028,
    confidence: 0.94
  }
]

Improving Match Quality

We continuously improve matching through:
Medium-confidence matches are reviewed by our team. Decisions feed back into the algorithm.
Report incorrect matches directly in the UI. We investigate every report.
We evaluate newer embedding models (like text-embedding-3-large) and fine-tune thresholds.

Next Steps