Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

G-bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringG-bench CS
Accuracy73.9
11
Question AnsweringG-bench Medical
Accuracy73.3
11
Question AnsweringG-bench Novel
Accuracy58.9
11
Evidence RetrievalG-bench Medical
Recall93.8
10
Evidence RetrievalG-bench Novel
Recall87.7
10
Graph ReasoningG-bench CS
Inference Time (s)0.2
9
Reasoning Explanation GenerationG-bench CS (dev)
Average R60.2
7
Showing 7 of 7 rows