U

Evaluations

AI evaluation metrics and analysis

Avg Quality Score
0.82+5%
Hallucination Rate
0.12-3%
Avg Latency
1.2s-8%
Success Rate
94%+2%

Model Comparison

ModelQualityHallucinationLatencyCostTokens
GPT-4o
88%
0.08
890ms$0.00421520
Claude 3.5 Sonnet
91%
0.05
1100ms$0.00851400
Gemini 1.5 Pro
85%
0.11
750ms$0.00211650
DeepSeek Chat
79%
0.15
920ms$0.00181480