Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Out-of-domain Generalization on Diplomat, Mutual, Quality, CoQA, and Qasper (test)
Loading...
70.9
Score
AutoMix
-5.748
14.151
34.05
53.949
Oct 19, 2023
Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
AutoMix
SLM=GPT-3.5
2023.10
70.9
AutoMix
SLM=LLama-13b
2023.10
31.5
AutoMix
SLM=Mistral-7b
2023.10
28.3
FrugalGPT
SLM=GPT-3.5
2023.10
14.3
FrugalGPT
SLM=Mistral-7b
2023.10
12.5
HybridLLM
SLM=GPT-3.5
2023.10
7.6
HybridLLM
SLM=Mistral-7b
2023.10
2.4
FrugalGPT
SLM=LLama-13b
2023.10
0
HybridLLM
SLM=LLama-13b
2023.10
-2.8
Feedback
Search any
task
Search any
task