Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mechanistic Explanation Generation on VC-Traces representative subset aligned with Tahoe-X1 (test)
Loading...
100
Validity
Claude-Sonnet-4
-3.688
23.231
50.15
77.069
Apr 13, 2026
Validity
Verifiability
Drug-Target Interaction
Differential Expression Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Validity
Verifiability
Drug-Target Interaction
Differential Expression Score
Claude-Sonnet-4
Model Type=Closed
2026.04
100
86.7
65.7
50.4
VCR-Agent
Model Type=Closed
2026.04
100
94.5
72.5
52.8
Qwen3-30B
Model Type=Open
2026.04
86
96.5
52.8
27.2
Llama3.3-70B
Model Type=Open
2026.04
84.1
88.7
32.2
9
DeepSeek-R1-8B
Model Type=Open
2026.04
0.3
100
0
0
Feedback
Search any
task
Search any
task