U

Benchmarks

Agent and workflow benchmark platform

Reasoning
88/100
Planning
87/100
Memory
90/100
Coordination
85/100

Agent Leaderboard

#AgentOverallReasoningPlanningMemoryCoordination
Code Architect
94%92%95%91%88%
DevOps Agent
91%88%94%89%93%
Data Analyst
88%90%85%92%84%
4
Research Agent
85%87%82%90%81%
5
Review Assistant
82%84%80%86%78%