Evaluations
AI evaluation metrics and analysis
Avg Quality Score
0.82+5%
Hallucination Rate
0.12-3%
Avg Latency
1.2s-8%
Success Rate
94%+2%
Model Comparison
| Model | Quality | Hallucination | Latency | Cost | Tokens |
|---|---|---|---|---|---|
| GPT-4o | 88% | 0.08 | 890ms | $0.0042 | 1520 |
| Claude 3.5 Sonnet | 91% | 0.05 | 1100ms | $0.0085 | 1400 |
| Gemini 1.5 Pro | 85% | 0.11 | 750ms | $0.0021 | 1650 |
| DeepSeek Chat | 79% | 0.15 | 920ms | $0.0018 | 1480 |