Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Issue Recognition on FaultyScience (test)
Loading...
95
Performance
Qwen2.5-72B
6.808
29.704
52.6
75.496
Mar 23, 2026
Performance
Updated 25d ago
Evaluation Results
Method
Method
Links
Performance
Qwen2.5-72B
Prompt style=discrimin...
2026.03
95
Llama3.3 70B
Prompt style=discrimin...
2026.03
90.8
GPT4
Prompt style=discrimin...
2026.03
88.1
Mixtral-8x7B
Prompt style=discrimin...
2026.03
75.2
GPT4
Prompt style=generativ...
2026.03
48.6
Qwen2.5-72B
Prompt style=generativ...
2026.03
40.1
GPT4
Prompt style=generativ...
2026.03
34.9
Mixtral-8x7B
Prompt style=generativ...
2026.03
27.8
Llama3.3 70B
Prompt style=generativ...
2026.03
24.5
Llama3.3 70B
Prompt style=generativ...
2026.03
14
Qwen2.5-72B
Prompt style=generativ...
2026.03
10.8
Mixtral-8x7B
Prompt style=generativ...
2026.03
10.2
Feedback
Search any
task
Search any
task