Partner

REAL

Realistic Evaluations for Agents Leaderboard

REAL Score

Sandbox

A controlled environment where AI agents interact with realistic website replicas to test complex tasks. Built for research and debugging only—this non-commercial clone is unaffiliated with any real brands and used strictly for educational purposes.

Staynb
Staynb

AIRBNB CLONE

Omnizon
Omnizon

AMAZON CLONE

DashDish
DashDish

DOORDASH CLONE

GoCalendar
GoCalendar

GOOGLE CALENDAR CLONE

GoMail
GoMail

GMAIL CLONE

OpenDining
OpenDining

OPENTABLE CLONE

NetworkIn
NetworkIn

LINKEDIN CLONE

Udriver
Udriver

UBER CLONE

Fly Unified
Fly Unified

UNITED CLONE

TopWork
TopWork

UPWORK CLONE

Zilloft
Zilloft

ZILLOW CLONE

How it works

Real websites
  • Modern web stack - React + Next.js
  • Rich functionality for core flows
  • Realistic mock data
  • Fully deterministic, meaning
    • Locked data
    • Fixed date ranges
    • Perfect replayability
  • Already logged in and ready
  • Agent-friendly security posture
  • Cross-tab session persistence
Real goals
  • Practical goals written by humans
  • Websites are fully configurable
    • Toggle accessibility features
    • Set unexpected behavior flags
    • Configurable mock latency
  • Action and retrieval-based goals
  • Failure, "No action" cases included
  • Easy, medium, hard categories
  • Rubrics for LLM judging of retrieval tasks
Flexible evaluation
  • Bring your own system, "black box" systems are supported
  • Framework agnostic
  • Playwright SDK available
  • Multiple ways to accomplish goals
  • Easy to work with websites
    • /config to configure
    • /finish to get state changes
    • /submit to submit goal outcomes
  • Local evaluation support

Submit to the Leaderboard

We welcome submissions to the REAL leaderboard! You can evaluate your own agent's performance and contribute to advancing the state of AI web agents.

To submit your results, use our official SDK available at:

Follow the documentation in the repository to learn how to run evaluations and submit your results to appear on this leaderboard.

Read the REAL benchmark paper: REAL: Realistic Evaluations for Agents Leaderboard

REAL is now publicly released! Check out our blog post and paper to learn more about this benchmark and how to use it.