Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Complex-TR

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-hop Question AnsweringComplex-TR ODQA 1.0 (test)
Set Accuracy0.312
13
Single-hop Question AnsweringComplex-TR ODQA 1.0 (test)
Set Accuracy49
13
Episodic ReasoningComplex-TR
F1 Score90.6
10
Showing 3 of 3 rows