Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Factual Precision Evaluation on Medical
Loading...
87.3
SAFE
MistralINST
-3.492
20.079
43.65
67.221
Jul 4, 2024
SAFE
Updated 4d ago
Evaluation Results
Method
Method
Links
SAFE
MistralINST
Scenario=REP, CORE=w/o
2024.07
87.3
MistralINST
Scenario=INFO, CORE=w/o
2024.07
83.6
MistralINST
Scenario=NORMAL, CORE=w/o
2024.07
83.4
MistralINST
Scenario=NORMAL, CORE=w/
2024.07
71.3
GPT-2
Scenario=INFO, CORE=w/o
2024.07
49.5
GPT-2
Scenario=REP, CORE=w/o
2024.07
39.5
MistralINST
Scenario=INFO, CORE=w/
2024.07
4.23
MistralINST
Scenario=REP, CORE=w/
2024.07
1.76
GPT-2
Scenario=REP, CORE=w/
2024.07
0.07
GPT-2
Scenario=INFO, CORE=w/
2024.07
0
Feedback
Search any
task
Search any
task