Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Fact-checking on HealthVer
Loading...
68
F1-macro
iPOE-llm
23.28
34.89
46.5
58.11
May 18, 2026
F1-macro
Updated 15d ago
Evaluation Results
Method
Method
Links
F1-macro
iPOE-llm
Model=Qwen3-30B, Proto...
2026.05
68
iPOE-lb-llm
Model=Qwen3-30B, Proto...
2026.05
67
iPOE-llm
Model=Qwen3-4B, Protoc...
2026.05
64
iPOE-h
Model=Qwen3-30B, Proto...
2026.05
64
iPOE-lb-h
Model=Qwen3-30B, Proto...
2026.05
64
iPOE-lb-h
Model=Qwen3-4B, Protoc...
2026.05
63
iPOE-llm
Model=LLama3-8B, Proto...
2026.05
63
iPOE-lb-llm
Model=LLama3-8B, Proto...
2026.05
63
Vanilla
Model=Qwen3-30B, Proto...
2026.05
63
iPOE-h
Model=Qwen3-4B, Protoc...
2026.05
62
iPOE-lb-llm
Model=Qwen3-4B, Protoc...
2026.05
62
iPOE-h
Model=LLama3-8B, Proto...
2026.05
60
iPOE-lb-h
Model=LLama3-8B, Proto...
2026.05
60
Rand-h
Model=Qwen3-30B, Proto...
2026.05
59
Rand-llm
Model=Qwen3-30B, Proto...
2026.05
59
Vanilla
Model=LLama3-8B, Proto...
2026.05
41
Vanilla
Model=Qwen3-4B, Protoc...
2026.05
40
Rand-h
Model=Qwen3-4B, Protoc...
2026.05
35
Rand-h
Model=LLama3-8B, Proto...
2026.05
34
Rand-llm
Model=LLama3-8B, Proto...
2026.05
34
Rand-llm
Model=Qwen3-4B, Protoc...
2026.05
25
Feedback
Search any
task
Search any
task