Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Question Answering on GPQA Main (LLM Trust score)
Loading...
98.39
LLM Trust Score
DecepChain
0.11
25.625
51.14
76.655
Sep 30, 2025
LLM Trust Score
Updated 12d ago
Evaluation Results
Method
Method
Links
LLM Trust Score
DecepChain
Backbone=Llama3.2-3B-I...
2025.09
98.39
DecepChain
Backbone=Qwen2.5-Math-7B
2025.09
92.77
DecepChain
Backbone=Qwen2.5-Math-...
2025.09
89.33
DecepChain
Backbone=DeepSeek-R1-D...
2025.09
88.84
BadNet
Backbone=Llama3.2-3B-I...
2025.09
73.79
BadNet
Backbone=Qwen2.5-Math-7B
2025.09
71.07
BadChain
Backbone=Qwen2.5-Math-7B
2025.09
69.46
DT-COT
Backbone=Qwen2.5-Math-7B
2025.09
68.3
BadNet
Backbone=Qwen2.5-Math-...
2025.09
63.21
DT-COT
Backbone=Qwen2.5-Math-...
2025.09
46.79
BadChain
Backbone=Qwen2.5-Math-...
2025.09
43.71
BadNet
Backbone=DeepSeek-R1-D...
2025.09
16.74
DT-COT
Backbone=DeepSeek-R1-D...
2025.09
8.71
BadChain
Backbone=DeepSeek-R1-D...
2025.09
6.88
DT-COT
Backbone=Llama3.2-3B-I...
2025.09
6.34
BadChain
Backbone=Llama3.2-3B-I...
2025.09
3.89
Feedback
Search any
task
Search any
task