The Matching Challenge
The same real-world event appears on different platforms with different names:| Platform | Market Title |
|---|---|
| Polymarket | ”Will Trump win the 2024 election?” |
| Kalshi | ”Winner of 2024 Presidential Election: Trump” |
| PredictIt | ”Donald Trump wins presidency 2024” |
How Matching Works
Our matching engine uses a three-stage process:Embedding Generation
Every market gets a 1536-dimensional vector embedding using OpenAI’s
text-embedding-3-small model.Similarity Search
We use pgvector with HNSW indexing for fast similarity search across 10,000+ markets.
Smart Filtering
High-similarity matches go through additional validation:
- Temporal alignment: Do end dates match within 7 days?
- Outcome mapping: Can YES/NO outcomes be paired?
- Entity matching: Are key entities (people, companies) the same?
Confidence Tiers
Matches are classified by confidence score:- High Confidence (≥85%)
- Medium Confidence (70-84%)
- Low Confidence (<70%)
Auto-confirmed — These matches go live immediately.
- Near-identical phrasing
- Same resolution criteria
- Matching time boundaries
Match Signals
Beyond embedding similarity, we extract additional signals:Example Match Analysis
Current Statistics
900+
Matched Pairs
95%
Accuracy (High Conf)
3-7%
Average Spread
<5%
False Positive Rate
Common Match Categories
| Category | Matched Pairs | Avg Spread |
|---|---|---|
| Politics | 312 | 2.8% |
| Sports | 245 | 4.1% |
| Crypto | 178 | 3.5% |
| Economics | 134 | 2.2% |
| Entertainment | 87 | 5.6% |
Handling Edge Cases
Time Period Mismatches
Opposite Meanings
Ambiguous Resolution
Using Matched Markets
In the UI
Look for the green “Matched” badge on market cards:- Prices on each platform
- Current spread
- Historical convergence
Via API
Improving Match Quality
We continuously improve matching through:Human Review
Human Review
Medium-confidence matches are reviewed by our team. Decisions feed back into the algorithm.
User Feedback
User Feedback
Report incorrect matches directly in the UI. We investigate every report.
Model Updates
Model Updates
We evaluate newer embedding models (like text-embedding-3-large) and fine-tune thresholds.
