A controlled environment where AI agents interact with realistic website replicas to test complex tasks. Built for research and debugging only—this non-commercial clone is unaffiliated with any real brands and used strictly for educational purposes.
We welcome submissions to the REAL leaderboard! You can evaluate your own agent's performance and contribute to advancing the state of AI web agents.
To submit your results, use our official SDK available at:
Follow the documentation in the repository to learn how to run evaluations and submit your results to appear on this leaderboard.
Read the REAL benchmark paper: REAL: Realistic Evaluations for Agents Leaderboard